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Simulated  annealing  and  the  genetic  algorithm  are  stochastic  relaxation  search  tech- 
niques suitable  for  application  to  a  wide  variety  of  combinatorial  complexity  nonconvex 
optimization  problems.  Each  produces  a  sequence  of  candidate  solutions  (or  populations 
of  candidate  solutions)  to  the  underiying  optimization  problem,  and  the  purpose  of  both 
algorithms  is  to  generate  sequences  biased  toward  solutions  which  optimize  the  objective 
function. 

The  appeal  of  simulated  annealing  is  that  it  provides  asymptotic  convergence  to  a 
globally  optimal  solution.  A  substantial  btxly  of  knowledge  exists  concerning  the  algo- 
rithm convergence  behavior.  It  is  based  upon  a  nonstationary  Markov  chain  algorithm 
model.  No  genetic  algorithm  model  comparable  in  scope  exists  in  the  literature.  This 
work  constitutes  an  attempt  to  provide  such  a  model  and  accompanying  convergence 
theory  by  extrapolating  the  simulated  annealing  results  onto  the  genetic  algorithm.  A  pre- 
requisite, developed  herein,  is  a  nonstationary  Markov  chain  genetic  algorithm  imxlcl. 
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The  essence  of  the  simulated  annealing  theory  is  demonstration  of  (1)  existence  of  a 
unique  asymptotic  probability  distribution  (stationary  distribution)  for  the  stationary  Mar- 
kov chain  corresponding  to  every  strictly  positive  constant  value  of  an  algorithm  control 
parameter  (absolute  temperature),  (2)  existence  of  a  stationary  distribution  limit  as  the 
control  parameter  approaches  zero,  (3)  the  desired  behavior  of  the  stationary  distribution 
limit  (i.e.  optimal  solution  with  probability  one)  and  (4)  sufficient  conditions  on  the  algo- 
rithm control  parameter  to  ensure  that  the  nonstationary  algorithm  achieves  (asymptoti- 
cally) the  limiting  distribution.  With  the  exception  of  (3),  this  work  adapts  that 
methodology  to  the  genetic  algorithm  Markov  chain  model  employing  a  genetic  operator 
parameter  (mutation  probability)  as  the  algorithm  control  parameter.  The  results  include  a 
mutation  probability  control  parameter  bound  analogous  to  (and  asymptotically  superior 
to)  the  conventional  simulated  annealing  parameter  bounds,  and  a  framework  for  repre- 
senting the  genetic  algorithm  stationary  distribution  components  at  all  consistent  fixed 
control  parameter  values,  including  zero. 

The  genetic  algorithm  stationary  distribution  limit  has  nonzero  components  corre- 
sponding to  all  solutions.  Thus,  the  simulated  annealing  global  optimality  convergence 
result  does  not  extrapolate.  However,  both  empirical  and  theoretical  evidence  is  provided 
which  suggests  that  the  desired  limiting  behavior  can  be  approached  by  suitably  adjusting 
the  algorithm  parameters. 
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SECTION  1 
INTRODUCTION 

1.1  Non-Convex  Combinatorial  Optimization  and  Stochastic  Search  Algorithms 
A  wide  variety  of  engineering  applications  lend  themselves  to  formulations  which 
require  the  solution  of  combinatorial  optimization  problems.  Typically,  the  optimization 
problem  is  nonconvex  and  is  defined  over  a  very  high  dimensionality  search  space  (e.g. 
inverse  vision  problems,  in  which  an  image  array  of  512X512  pixels  at  8  bits/pixel  might 
be  encountered,  resulting  in  a  search  space  dimensionality  of  ~2M).  Consequently,  direct 
solution  is  usually  intractable. 

An  alternative  to  direct  solution  is  to  select  one  of  a  variety  of  iterative  improve- 
ment solution  techniques,  usually  some  variant  of  gradient  search.  But  by  definition, 
deterministic  iterative  improvement  techniques  terminate  in  local  extrema,  and  they 
ordinarily  provide  no  means  of  assessing  the  amount  by  which  the  selected  local  extre- 
mum  deviates  from  the  global  extiemum.  A  typical  means  of  avoiding  local  extrema 
entrapment  is  to  implement  the  iterative  improvement  solution  method  stochastically. 

The  most  commonly  employed  stochastic  algorithm  approach  to  combinatorial  opti- 
mization is  simulated  annealing  [KiGe83,  LaAa871,  which  is  also  sometimes  referred  to 
as  probabilistic  hill  climbing  |RoSa85|.  It  exploits  the  analogy  of  combinatorial 
optimization  to  the  annealing  of  crystalline  solids,  in  which  a  solid  is  cooled  very  gradu- 
ally from  some  elevated  temperature  and  thereby  allowed  to  relax  toward  its  low  energy 
states.  The  appeal  of  the  algorithm  class  derives  from  the  fact  that  provided  certain 
constraints  on  an  algorithm  control  parameter  (analogous  to  absolute  tcnipcratiirc)  are 
observed,  asymptotic  convergence  to  a  global  extrcrnum  is  guaranteed. 


The  key  limitation  of  simulated  annealing  is  that  the  convergence  behavior  is 
asymptotic.  Thus  global  optimality  is  obtained  only  after  an  infinite  number  of  algorithm 
iterations.  The  rate  of  convergence  to  optimality  is  determined  by  a  nonnegative  algo- 
rithm control  parameter  whose  ideal  value  is  zero  and  which  must  observe  a  lower  bound 
in  order  to  assure  coherent  algorithm  behavior.  The  best  available  known  bound  for  the 
parameter,  the  annealing  schedule  bound,  is  of  the  form  K/log(k)  where  k  is  the  iteration 
index  and  K  is  a  parameter  independent  of  k  [GeGe84,  MiRo85]. 

Another  combinatorial  optimization  stochastic  search  technique  reported  in  the  lit- 
erature is  the  genetic  algorithm  [Davi87,  Gold83,  Gold89a,  Gref85,  Gref87].  It  emulates 
the  evolution  of  biological  systems  by  employing  a  set  of  stochastic  operators  (e.g. 
reproduction,  crossover  and  mutation)  to  transform  a  population  of  candidate  solutions  to 
the  underlying  optimization  problem  into  a  new  (descendent)  population.  It  has  some  fea- 
tures which  suggest  that  it  may  provide  significantly  improved  convergence  behavior 
over  simulated  annealing  on  certain  types  of  optimization  problems.  However,  the  nature 
of  the  genetic  operators  and  their  influence  on  algorithm  behavior  is  only  understood  in 
general  terms.  No  complete  theoretical  model  of  the  algorithm  exists  in  the  literature.  The 
fundamental  goal  of  the  work  reported  here  is  to  provide  a  theoretical  framework  for  ana- 
lyzing the  algorithm  based  upon  the  asymptotic  probability  distribution  of  the  solution 
sequences  which  it  produces.  The  work  reported  herein  includes  significant  progress  on 
the  key  intermediate  steps  to  achieving  that  goal. 

1.2  Organization 

The  remaining  sections  of  this  paper  are  organized  as  follows.  Sections  2  and  3  are 
background  reviews  of  the  simulated  annealing  and  genetic  algorithm  literature  respec- 
tively. Section  2  places  considerable  emphasis  on  the  methodology  employed  to  yield  the 
asymptotic  convergence  results  which  are  the  theoretical  foundation  of  the  simulated 
annealing  algorithm.  That  methodology  appeals  heavily  to  the  theory  of  inhomogeneous 
(nonstationary)  Markov  chains  and  their  asymptotic  state  probability  distributions.  The 


essence  of  the  simulated  annealing  convergence  theory  is  a  set  of  sufficient  conditions  to 
ensure  that  the  asymptotic  probability  distribution  of  the  Markov  chain  which  represents 
the  algorithm  is  independent  of  its  starting  state  and  has  probability  zero  for  all  states  cor- 
responding to  sub-optimal  solutions. 

Section  3  begins  with  a  verbal  description  of  the  three  fundamental  stochastic 
operators  employed  in  genetic  algorithms  (i.e.  reproduction,  crossover  and  mutation),  and 
proceeds  to  review  the  existing  theoretical  foundation  of  the  algorithm  class.  A  conclu- 
sion of  that  section  is  that  while  certain  important  theoretical  results  exist,  notably  the  so- 
called  schema  theorem  and  some  work  on  a  problem  construct  referred  to  as  the  minimal 
deceptive  problem,  the  genetic  algorithm  lacks  the  theoretical  foundation  necessary  to 
either  compare  it  with  simulated  annealing  or  to  answer  key  questions  concerning  the 
design  of  a  genetic  algorithm  for  a  given  application. 

The  author's  contribution  to  this  work  begins  with  Section  4.  The  major  result  of 
that  section  is  a  very  general,  nonstationary  Markov  chain  model  of  the  variants  of  the 
genetic  algorithm  which  employ  combinations  of  the  three  fundamental  genetic  algorithm 
operators.  The  model  is  tailored  to  resemble  that  employed  in  developing  the  simulated 
annealing  methodology,  and  in  that  regard,  the  genetic  algorithm  mutation  operator  is 
shown  to  provide  a  function  very  similar  to  that  of  the  simulated  annealing  absolute  tem- 
perature analog.  Specifically,  the  stationary  algorithm  corresponding  to  every  constant 
value  of  the  mutation  probability  parameter  satisfying  0<Pm<l  possesses  a  unique 
asymptotic  probability  distribution  (stationary  distribution). 

The  key  results  of  Section  4  are  state  transition  matrices,  formulated  in  temis  ot  the 
algorithm  parameters  and  objective  function  of  the  underiying  optimization  problem,  for 
one,  two  and  three-operator  variants  of  the  algorithm.  Due  to  the  nature  of  the  genetic 
operators,  the  state  transition  matrices  exhibit  an  extremely  high  degree  of  symmetry. 
These  matrices,  and  some  key  related  results,  are  used  extensively  in  later  sections. 


Section  5  digresses  briefly  from  the  theoretical  development  to  produce  and 
examine  some  empirical  work  based  upon  the  algorithm  model.  The  presentation  is  not, 
nor  is  it  intended  to  be,  a  thorough  empirical  study.  It  is  provided  to  help  fix  some  of  the 
algorithm  model  state  space  and  asymptotic  probability  distribution  ideas  which  are  cen- 
tral to  this  work,  and  it  anticipates  some  of  the  theoretical  results  which  follow. 

Section  6  resumes  the  theoretical  development.  Its  result  is  an  expression  for  the 
components  of  the  unique  asymptotic  probability  distribution  produced  by  the  stationary 
algorithm  variants  which  implement  the  mutation  operator  with  nonzero  mutation  proba- 
bility (i.e.  the  stationary  two  and  three-operator  algorithm  variants).  The  result  is 
expressed  in  terms  of  Cramer's  Rule  and  thus  its  solution  requires  evaluation  of 
determinants.  The  determinants  are  the  characteristic  polynomials,  evaluated  at  A,=  1,  of 
matrices  derived  from  the  state  transition  matrix  produced  in  Section  4  by  zeroing  one 
row.  A  later  section  attacks  the  problem  of  explicitly  solving  the  system,  based  upon  the 
highly  symmetrical  nature  of  the  state  transition  matrix,  but  some  very  significant  results 
are  obtainable  from  the  product  of  Section  6  without  explicit  solution. 

An  essential  step  in  establishing  a  connection  between  simulated  annealing  and  the 
genetic  algorithm  is  demonstrating  the  existence  of  a  stationary  distribution  limit  for  the 
algorithm  as  the  mutation  probability  approaches  zero.  Section  7  accomplishes  that  task 
and  also  provides  a  foundation  for  deducing,  in  Section  8,  a  mutation  probability  bound 
analogous  to  the  annealing  schedule  bounds  of  the  simulated  annealing  algorithm.  The 
results  developed  in  Sections  7  and  8  apply  to  both  the  two  and  three-operator  algorithm 
variants. 

A  somewhat  surprising  result  produced  in  Section  7  and  anticipated  by  the  empiri- 
cal study  reported  in  Section  5  is  that  the  stationary  distribution  zero  mutation  probability 
limit  does  not  necessarily  isolate  globally  optimal  solutions.  In  fact,  it  provides  nonzero 
probability  for  all  solutions  of  the  underlying  optimization  problem  and  consequently  the 
extrapolation  of  the  simulated  annealing  methodology  is  less  than  exact.  However,  both 


the  empirical  results  presented  in  Section  5  and  some  results  developed  later  in  Section  9 
suggest  that  the  required  limiting  behavior  can  be  approached  as  closely  as  desired  by 
adjusting  the  algorithm  parameters  appropriately. 

Section  9  attacks  the  problem  of  explicitly  solving  the  system  which  results  from 
the  Cramer's  Rule  formulation  of  the  stationary  distribution  of  the  time-homogeneous 
two  and  three- operator  algorithms.  It  is  a  very  extensive  development  which  yields  an 
expression  for  the  coefficient  of  the  general  term  in  the  Taylor's  series  expansion  of  the 
required  determinants.  It  is  based  upon  the  highly  symmetrical  nature  of  the  state  trans- 
ition matrix,  as  alluded  to  earlier. 

The  results  of  Section  9  are  not  reduced  to  a  directly  useable  explicit  solution. 
Nevertheless,  they  do  provide  significant  insight  into  the  functional  form  of  the  stationary 
distribution  components.  Furthermore,  Section  9.5  points  out  some  very  significant  iden- 
tities which  exist  among  the  coefficients  of  the  Taylor's  series  and  suggests  a  method  for 
continuing  the  Section  9  development  based  upon  the  algebra  of  symmetric  and 
alternating  polynomials.  Explicit  solution  of  the  stationary  distribution  equations  is  the 
major  incomplete  task  required  for  extrapolation  of  the  simulated  annealing  convergence 
theory  onto  the  genetic  algorithm. 

Section  10  summarizes  this  work  and  recapitulates  the  significant  results.  It  also 
proposes  continuation  of  two  parts  of  this  research:  (1)  pursuit  of  the  stationary  distribu- 
tion solution  and  (2)  refinement  of  the  mutation  probability  control  parameter  bound. 

An  appropriate  mathematical  framework  for  examining  both  the  simulated 
annealing  and  genetic  algorithms  is  the  theory  of  Miu-kov  chains.  Appendix  A  is  included 
to  summarize  some  essential  definitions  and  theorems.  Appendix  B  is  devoted  to  the 
Perron- Frobenius  theorem,  which  is  fundamental  to  the  study  of  nonnegative  matrices  in 
general  and  Markov  chains  in  particular.  Several  important  Markov  chain  theorems  are 
specializations  of  it  and  the  key  developments  in  Sections  6  and  7  require  its  application. 
All  of  the  Appendix  A  and  Appendix  B  results  are  provided  without  proof  or  elaboration. 


but  their  foundation  is  obtainable  from  various  references  (e.g.  [Cinl75]  for  the  more  ele- 
mentary results  in  Appendix  A,  [SeneSl]  for  the  Appendix  B  material  on  the  Perron- 
Frobenius  Theorem  and  [SeneSl,  IsMa76]  for  the  Appendix  A  ergodicity  related 
definitions  and  theorems).  These  results  are  invoked  freely  in  the  following  sections, 
either  by  specific  reference  to  definition/theorem  number,  or  if  the  context  makes  it 
appropriate,  they  are  simply  assumed. 

Appendix  C  is  provided  as  background  for  the  Section  9.5  discussion  on  coefficient 
identities  and  extending  the  stationary  distribution  representation  development.  With  the 
exception  of  Section  C.4,  the  material  presented  in  Appendix  C  is  obtainable  from 
advanced  algebra  texts  (e.g.  [MoSt64]).  The  symmetric/alternating  polynomial  general- 
ization in  Section  C.4  is  original. 

Appendix  D  collects  the  computer  program  listings  for  the  programs  employed  in 
generating  the  results  reported  in  Section  5.  The  programs  presented  there  were  devel- 
oped and  executed  on  the  Cray  Y-MP  operated  by  the  Computer  Science  Directorate  at 
EgUn  AFB,  PL 


SECTION  2 

SIMULATED  ANNEALING 

2.1  Overview 

As  noted  in  the  introduction,  a  very  commonly  employed  approach  to  the  solution 
of  nonconvex  combinatorial  optimization  problems  is  a  stochastic  relaxation  technique 
introduced  by  Kirkpatrick  et  al.  and  referred  to  as  simulated  annealing  [KiGe83].  The 
technique  is  so  named  by  virtue  of  its  analogy  to  the  annealing  of  solids,  in  which  a  crys- 
talline solid  is  heated  to  its  melting  point  and  then  allowed  to  cool  very  gradually  until  it 
is  again  in  the  solid  phase  at  some  nominal  temperature.  In  the  limiting  case  of 
infinitesimal  cooling  rate  and  absolute  zero  final  temperature,  the  resulting  solid  achieves 
its  most  regular  possible  crystal  lattice  configuration  (i.e.  minimum  lattice  energy  state), 
and  hence  is  free  of  crystal  defects.  Simulated  annealing  establishes  the  connection 
between  this  sort  of  thermodynamic  behavior  and  the  search  for  the  global  minimum  of 
an  objective  function  in  a  combinatorial  optimization  problem,  and  further,  it  provides  an 
algorithmic  means  of  exploiting  the  connection.  This  section  is  a  review  of  the  technique 
with  special  emphasis  on  known  results  which  bound  the  convergence  behavior  of  com- 
puter algorithms  belonging  to  the  class. 

2.2  Statistical  Mechanics  and  Annealing  of  Solids 

The  fundamental  assumption  of  statistical  physics  is  that  the  thermodynamic  behav- 
ior of  a  many  particle  system  can  be  represented  by  a  statistical  ensemble,  and  that  if  the 
system  is  in  thermal  equilibrium,  the  time  averages  of  macroscopic  thcmiodynaniic 
properties  of  the  system  are  equal  to  the  corresponding  ensemble  averages  (ergodicity 
hypothesis).  The  random  variable  represented  by  the  ensemble  is  the  system  thermal 
energy,  and  at  thermal  equilibrium  the  probability  distribution  is  completely  determined 
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by  the  system  temperature.  The  distribution  is  known  as  the  Boltzman  distribution,  or 
alternatively  as  the  Gibbs  distribution,  and  its  form  is 

exp{-E(i)/kT} 


Pr{E  =  E(i)}=- 


Z(T) 


Eq.  2.1 


where 


E(i) 
k 
T 
Z(T) 


=  the  system  thermal  energy  (a  random  variable) 

=  the  energy  corresponding  to  state  i 

=  Boltzman's  constant 

=  the  system  temperature 

=  the  partition  function. 


The  factor 


exp 


-E(i) 
kT 


is  called  the  Boltzman  factor.  The  partition  function  provides  the  necessary  normalization 
to  make  Eq.  2. 1  a  state  occupancy  probability.  It  can  be  expressed  as 

Z(T)=Iexp{:|P}.  Eq.2.2 

At  elevated  temperatures,  the  system  represented  by  the  probability  distribution  in 
Eq  2.1-2  occupies  all  states  in  its  state  space  with  nearly  uniform  probability,  while  at 
low  temperatures,  states  having  low  energy  are  favored.  When  the  temperature 
approaches  absolute  zero,  only  states  corresponding  to  the  minimum  value  of  energy 
have  nonzero  probability.  Thus,  the  thermodynamic  system's  energy  function  can  be 
effectively  searched  for  its  minimum  value  by  starting  the  system  at  an  elevated  tempera- 
ture and  allowing  it  to  cool  gradually  to  absolute  zero,  at  which  point  one  of  its  minimum 
energy  states  is  occupied  with  probability  one.  This  is  the  mechanism  which  guides  the 
annealing  of  solids. 

The  cooling  schedule  employed  in  annealing  solids  is  constrained  by  the  require- 
ment that  the  system  be  allowed  to  achieve  thermal  equilibrium  at  each  temperature.  The 


Gibbs  distribution  only  represents  the  system's  energy  distribution  in  the  stationary  case 
(i.e.  equilibrium).  If  this  requirement  is  not  satisfied,  defects  can  be  frozen  into  the  crystal 
lattice  preventing  the  system  from  achieving  the  minimum  possible  energy  state.  This 
behavior  is  analogous  to  local  minima  entrapment  in  combinatorial  optimization  search. 
The  restriction  on  the  annealing  schedule  necessary  to  avoid  it  is  the  fundamental  limita- 
tion on  the  annealing  technique. 

2.3  Combinatorial  Optimization  by  Simulated  Annealing 

Simulated  annealing  approaches  combinatorial  optimization  problems  in  a  closely 
analogous  fashion.  In  simulated  annealing,  the  optimization  problem's  solution  space  cor- 
responds to  the  state  space  of  the  analogous  thermodynamic  system  and  its  cost  function 
is  analogous  to  the  thermodynamic  system's  energy  surface.  The  analog  of  the 
thermodynamic  system's  temperature  is  a  nonnegative  algorithm  control  parameter,  T. 

Two  other  algorithm  components  are  also  required.  They  are  the  stochastic  next 
state  generation  and  acceptance  mechanisms,  and  they  incorporate  the  dependence  of  the 
algorithm  on  the  control  parameter,  T.  The  next  state  generation  mechanism  is  employed 
by  the  algorithm  to  transform  a  current  solution  into  a  new  candidate  solution,  and  the 
acceptance  mechanism  is  employed  to  decide  whether  to  retain  or  discard  the  proposed 
new  solution.  Together,  these  stochastic  operators  are  responsible  for  making  the  search 
algorithm  simulate  the  thermodynamic  system's  statistical  behavior.  Consequently,  they 
must  satisfy  certain  requirements  to  assure  coherent  algorithm  behavior.  These  require- 
ments are  explored  in  some  depth  later  in  the  context  of  algorithm  convergence  behavior. 

Conceptually,  the  operation  of  the  simulated  annealing  algorithm  can  be  described 
as  follows.  The  algorithm  starts  at  some  initial  value  of  the  control  p;u-ameter  and  with 
some  initial  solution.  Then,  the  state  generation  mechanism  is  employed  to  synthesize  a 
new  candidate  solution.  The  new  solution  is  examined  by  the  acceptance  mechanism  and 
either  accepted  or  rejected.  If  it  is  accepted,  the  new  solution  becomes  the  current  solu- 
tion. Otherwise,  the  old  current  solution  is  retained.  This  process  is  repeated,  generating  a 
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sequence  of  temporary  solutions,  until  an  approximate  equilibrium  is  achieved  in  which 
the  solution  space  occupancy  is  described  by  the  Gibbs  distribution  (Eq.  2.1-2).  Once  this 
approximate  equilibrium  is  achieved,  the  control  parameter  value  is  reduced  and  the  solu- 
tion sequence  is  extended  until  equilibrium  is  achieved  at  the  new  control  parameter 
value.  This  process  is  repeated  until  some  termination  condition  (e.g.  minimum  control 
parameter  value)  is  attained.  The  current  solution  at  termination  is  then  accepted  as  the 
solution  to  the  optimization  problem. 

It  is  noted  in  passing  that  simulated  annealing  always  involves  minimizing  a  cost 
functional,  never  maximizing  a  reward.  However,  this  causes  no  loss  of  generality 
because  any  combinatorial  optimization  problem  can  be  translated  into  an  equivalent 
minimization  problem. 

2.4  Theoretical  Foundations  of  Simulated  Annealing 

The  evolution  of  the  search  sequence  of  a  simulated  annealing  algorithm  as  out- 
lined, in  which  each  succeeding  solution  in  the  sequence  is  determined  stochastically 
based  upon  the  current  solution,  suggests  that  the  algorithm  behavior  can  be  described  as 
a  Markov  chain.  Indeed  it  can,  and  all  of  the  known  convergence  results  for  simulated 
annealing  algorithms  are  derived  from  analysis  of  Markov  chain  models  [LaAa87, 
GeGe84,  LuMe86,  MiRo85,  Rior58].  This  subsection  establishes  a  Markov  chain  model 
to  represent  the  simulated  annealing  algorithm  and  then  employs  it  in  reviewing  the 
development  of  the  published  convergence  bounds.  This  development  essentially  follows 
[LaAa87]. 
2.4.1  A  Markov  Chain  Model  of  Simulated  Annealing 

Let  a  combinatorial  optimization  problem  be  represented  by  the  pair  (S,C)  where  S 

is  the  problem's  solution  space  and  C  is  its  cost  function,  and  assume  without  loss  of  gen- 
erality that  the  optimization  problem  requires  minimization  of  C.  Also,  assume  that  S  is 
finite.  Then,  a  simulated  annealing  algorithm  for  solving  this  problem  can  be 
characterized  by  the  quadruple  (S,io,Px,T)  where  S  is  as  defined  above  and  where  io  e  S 
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is  an  initial  candidate  solution,  p  is  a  stochastic  matrix  which  describes  a  stochastic  state 
transition  mechanism  (the  composition  of  the  next  state  generation  and  acceptance  mech- 
anisms discussed  in  Section  2.3)  and  x  =  {T^}  is  a  finite  length  monotone  nonincreasing 
sequence  of  positive  control  parameter  values.  The  first  parameter  value  in  x  is  Tq  and  the 
final  value  is  Tf.  P^-  incorporates  the  algorithm  dependence  on  both  C  and  T. 

The  algorithm  generates  a  sequence  of  candidate  solutions,  {in:0  <  k  <  f},  by 
employing  the  state  transformation  mechanism  (described  by  Pj)  to  transform  solution  i^ 
into  i^+i.  At  the  k-  transition,  P^  is  completely  determined  by  T^.  The  solution  sequence  is 
extended  until  T  =  Tf,  at  which  point  the  current  solution,  if,  is  accepted  as  the  solution  to 
the  combinatorial  optimization  problem.  Thus  Tf  signals  algorithm  termination.  Tf  can  be 
allowed  to  depend  on  {i,;}  provided  due  regard  is  paid  to  the  requirement  for  termination. 
Since  the  solution  state  transition  mechanism  is  stochastic,  and  since  the  conditional 
dependence  of  the  solution  sequence  only  extends  to  one  transition,  the  solution  sequence 
is  a  Markov  chain  by  Definition  Al.  Its  state  transition  matrix  is  Py  (Definition  A4). 

The  state  transition  matrix  is  decomposed  into  two  parts  for  convenience  in  the  fol- 
lowing. It  consists  of  the  next  state  generation  mechanism,  G|j(T),  which  describes  the 
probability  of  generating  state  j  given  that  the  current  state  is  i,  and  the  state  acceptance 
mechanism,  A|j(T),  which  describes  the  probability  of  accepting  the  generated  state.  Thus, 
P-r(i,j)  is  written  as 

G,(T)A,(T)  j^i 


P,ihi) 


1-    i    G„(T)A„(T)    j  =  i 

1=1. I*i 


Eq.  2.3 


In  this  result,  N  =  card(S)  represents  the  cardinality  of  the  solution  space. 

It  is  noted  in  passing  that  the  usual  form  of  the  state  acceptance  mechanism  is  the  so 
called  Metropolis  criterion  |Metr53|,  given  by 

-(C(j)-C(i))\  £^^,^ 


Aj:(T)  =  mim  l,exp 
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This  is  the  form  employed  by  Kirkpatrick  et  al.  in  the  original  work  [KiGe83]  and  most 
others  published  since  are  variations  of  it.  Also,  the  usual  form  of  the  next  state  genera- 
tion mechanism  is 

G,j(T)  =  G,  =  G,=    N.        J^    *  Eq.2.5 

0      otherwise 

where  S;  c  S  is  the  set  of  states  accessible  from  state  i  in  one  transition  (by  definition, 
i ««  Sj),  and  where  Nj  =  card(S,).  Note  that  G,j  defined  by  Eq.  2.5  is  symmetric  and  inde- 
pendent of  T. 
2.4.2  Asymptotic  Convergence  Behavior 

The  subject  of  interest  in  the  remainder  of  this  section  is  a  set  of  sufficient  condi- 
tions on  P-r  and  x  to  ensure  that  an  optimal  solution  is  achieved.  These  conditions  will 
prove  to  guarantee  asymptotic  convergence  only  (i.e.  T  must  be  an  infinite  sequence, 
which  of  course  violates  the  termination  requirement  of  the  algorithm).  Two  cases  will  be 
examined.  The  first  only  involves  time-homogeneous  (stationary)  Markov  chains  (Defini- 
tion A5)  and  is  presented  due  to  its  relative  ease  of  analysis.  Its  purpose  is  to  provide  a 
foundation  for  the  essential  ideas  involved  in  the  second  case,  which  requires  an  appeal  to 
ergodicity  theorems  for  inhomogeneous  (nonstationary)  Markov  chains.  The  useable  con- 
vergence behavior  results  which  are  the  goal  of  this  effort  derive  from  analysis  of  the  sec- 
ond case. 

The  first  (simple)  algorithm  is  represented  as  a  sequence  of  solutions  evolving  as  a 
sequence  of  distinct  Markov  chains.  Each  Markov  chain  in  the  sequence  executes  at  a 
fixed  control  parameter  value  (and  hence  is  time-homogeneous)  and  each  succeeding 
Markov  chain  executes  at  a  lower  (but  strictly  positive)  parameter  value.  Thus,  in  the 
sequence  x,  each  distinct  parameter  value,  T„  is  associated  with  a  distinct  time- 
homogeneous  Markov  chain  and  T,  occurs  at  some  large  number  of  consecutive  locations, 
K„  in  X.  This  case  is  hereafter  referred  to  as  the  homogeneous  (or  stationary)  algorithm. 
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The  analysis  of  the  convergence  behavior  of  the  homogeneous  algorithm  includes  the 
hypothesis  that  each  Markov  chain  in  the  sequence  achieves  its  stationary  distribution. 
This  hypothesis  is  equivalent  to  K,  — >  <»  for  all  1  (Definition  A 10,  Theorem  A3,  Theorem 
A4). 

In  the  second  case,  the  algorithm  is  represented  as  a  sequence  of  solutions  evolving 
as  a  single  inhomogeneous  (nonstationary)  Markov  chain.  This  formulation  is  hereafter 
referred  to  as  the  inhomogeneous  (or  nonstationary)  algorithm.  In  the  inhomogeneous 
algorithm,  the  control  parameter  value  is  allowed  to  decrease  (though  not  necessarily 
required  to)  after  each  state  transition.  The  dependence  of  G|j(T)  and  A,j(T)  on  T  results  in 
the  inhomogeneous  behavior. 
2.4.2.1  The  Homogeneous  Algorithm 

In  the  homogeneous  algorithm,  the  means  of  establishing  the  requirements  for 
asymptotically  optimal  convergence  is  to  first  establish  sufficient  conditions  for  existence 
of  the  stationary  distribution  of  each  Markov  chain  and  then  to  establish  sufficient  condi- 
tions to  ensure  that  the  stationary  distribution  converges  to  a  uniform  distribution  over  the 
set  of  optimal  solutions  as  the  control  parameter  value  approaches  zero.  That  is 

1 


imqT(i)  =  ^N„p,       '^    "•"  Eq.  2.6 

0      otherwise 


T->0 


where  %  is  the  stationary  distribution  of  the  Markov  chain  executing  at  control  parameter 
value  T,  S^p,  c  S  is  the  set  of  solutions  i  e  S:C(i)  =  Qp,  and  N^p,  =  card(SopJ. 

Theorems  A1-A3  can  be  employed  to  deduce  sufficient  conditions  on  Pi(iJ)  (or 
alternatively  on  G,j(T)  and  A,j(T))  to  ensure  the  existence  of  the  stationary  distribution  of 
each  Markov  chain  in  the  sequence  representing  the  homogeneous  algorithm.  Since  only 
combinatorial  (finite  solution  space)  optimization  problems  are  under  consideration  and 
since  by  definition  the  homogeneous  algorithm  only  employs  time-homogeneous  Markov 
chains,  the  finite  state  space  and  time-homogeneity  requirements  of  Theorem  A3  are 
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satisfied.  Beyond  these  requirements,  existence  of  the  stationary  distribution  of  each  Mar- 
kov chain  in  the  homogeneous  algorithm  only  requires  that  the  chain  produced  by  Pj  be 
irreducible  and  aperiodic  (Definitions  A7  and  A9). 

If  Ay(T)  is  selected  as  the  Metropolis  criterion,  Eq.  2.4,  then 

Vi,j€  E,VT>0:A,/T)>0. 

Thus,  from  Eq.  2.3,  the  irreducibility  requirement  is  transferred  to  the  next  state  genera- 
tion mechanism,  G,j(T).  Note  that  from  Theorem  Al,  irreducibility  can  readily  be 
achieved  within  the  definition  supplied  by  Eq.  2.5.  Also,  in  [MiRo85],  Theorem  A2  is 
used  to  show  that  a  sufficient  condition  for  aperiodicity  is 

VT>0  3i,je  E  3  Aij(T)<l. 

This  condition  is  satisfied  by  the  Metropolis  criterion  provided  the  trivial  case  indicated 
by 

Vi,jeE:C(i)  =  C(j)  =  Qp, 

is  excluded,  because  then  k,  1  always  exist  such  that 

C(l)  =  Qp,<C(k).  Eq.2.7 

The  sufficient  condition  on  Aij(T)  can  then  be  met  by  selecting  i  =  1  and  j  —  k.  (Use  Eq. 
2.7  in  Eq.  2.4). 

Although  existence  of  the  stationary  distribution  (or  at  least  sufficient  conditions  on 
Gij(T)  and  A,j(T)  to  ensure  its  existence)  are  now  established,  and  examples  of  Gy  and  Ay 
which  meet  these  conditions  provided,  actually  achieving  the  stationary  distribution  is 
only  guaranteed  after  an  infinite  number  of  state  transitions.  This  is  equivalent  to  the  ther- 
mal equilibrium  constraint  on  the  temperature  schedule  for  annealing  solids  discussed  in 
Section  2.2.  Each  Markov  chain  in  the  sequence  representing  the  homogeneous  algorithm 
is  subject  to  this  requirement,  and  consequently  must  be  of  infinite  length. 

Next,  sufficient  conditions  to  assure  convergence  of  the  stationary  distribution  of 
the  final  Markov  chain  in  the  homogeneous  algorithm  to  the  desired  optimal  distribution 
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(Eq.  2.6)  are  established.  First,  note  that  if  the  stationary  distribution  of  a  Markov  chain 
in  the  sequence  exists,  then  a  function  g(C(i),T)  corresponding  to  that  Markov  chain 
exists  such  that 

j 
where  g  satisfies 

(1)     VieE,VT>0     :  g(C(i),T)>0 


(2)  Vj  €  E 


Zg(C(i),T)G,/T)Ay(T)=       .  Eq.2.9 

g(C(i),T)ZG,/T)A,/T) 


This  can  be  deduced  by  noting  that  the  uniquely  determining  conditions  on  q  expressed  in 
Theorem  A3  are  met  by  g  satisfying  Eq.  2.8  and  2.9.  Eq.  2.9  is  called  the  global  balance 
equation.  Close  examination  reveals  that  it  is  exactly  the  necessary  condition  for  equilib- 
rium state  occupancy.  A  more  restrictive  condition,  in  which  the  balance  holds  for  every 
pair  of  states  on  a  pair- wise  basis  is  called  the  detailed  balance  equation. 

It  can  be  shown  that  the  following  additional  constraints  on  g  guarantee  conver- 
gence of  the  stationary  distribution  to  the  optimal  (i.e.  to  Eq.  2.6)  [MiRo851.  Note  that 
Eq.  2.10(2)  requires  an  exponential  form. 

0     A>0 


(1)      limg(A,T)  = 

T-+0  [oo       A<() 

g(A.,T)  Eq.2.10 

^'^       g(A;^  =  ^^^'-^^'^) 
(3)         VT>0:g(0,T)=l 

Collectively,  Eq.  2.8-2.10  provide  a  set  of  sufficient  conditions  on  G|j(T)  and  A,j(T) 
to  assure  convergence  of  the  stationary  distribution  to  Eq.  2.6.  The  key  condition,  the 
global  balance  equation,  is  implicit  however,  and  thus  is  very  difficult  to  apply.  Neverthe- 
less, it  can  be  shown  |LaAa871  that  if  Gij(T)  and  A,j(T)  defined  by  Eq.  2.4  and  Eq.  2.5  are 
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employed,  the  conditions  are  satisfied,  and  that  the  corresponding  stationary  distribution 
is  provided  by 

w     T.       M      exp{-(C(i)-QpJ/kT} 

Vi  6  E  :  QtU)  =  T^ .  Eq.  2.11 

^'^^     Iexp{-(Ca)-QpJ/kT}  ^ 

j 

The  key  to  that  development  is  that  the  Gij(T)  and  AyCT)  of  Eq.  2.4  and  2.5  satisfy  the 
detailed  balance  equation,  the  symmetry  of  Gy  being  a  critical  consideration. 

The  behavior  required  by  Eq.  2.10(1)  is  limiting  behavior  as  T  — >  0.  Thus,  these 
conditions  assure  convergence  to  the  global  minimum  with  probability  one  (i.e.  conver- 
gence of  the  stationary  distribution  to  Eq.  2.6),  only  if  the  sequence  of  Markov  chains  is 
infinite  and  lim  T,  =  0.  Recalling  that  a  guarantee  of  achieving  the  stationary  distribution 

l_>=o 

requires  that  each  Markov  chain  be  of  infinite  length,  the  homogeneous  algorithm  is  seen 
to  require  a  doubly  infinite  sequence  of  solutions  composed  of  an  infinite  sequence  of 
infinitely  long  Markov  chains. 
2.4.2.2  The  Inhomogeneous  Algorithm 

The  behavior  of  the  homogeneous  algorithm,  which  requires  that  an  infinite  number 
of  transitions  be  executed  at  each  control  parameter  value,  clearly  is  not  very  useful.  The 
following  reviews  two  published  convergence  results  which  extend  the  ideas  developed 
for  the  homogeneous  algorithm  to  the  inhomogeneous  counterpart  [GeGe84,  MiRo85]. 
These  results  adopt  the  sufficient  conditions  on  Gij(T)  and  Aij(T)  developed  for  the  homo- 
geneous algorithm  as  a  starting  point  (i.e.  irreducibiUty,  aperiodicity  and  Eq.  2.8-2.10) 
and  extend  them  to  the  case  in  which  each  time-homogeneous  Markov  chain  is  finite 
length  (i.e.  to  the  inhomogeneous  algorithm).  The  key  products  of  this  effort  are  lower 
bounds  on  the  algorithm  control  parameter's  approach  to  zero.  In  both  cases  discussed 
here,  the  bound  is  of  the  form  K/log(k)  where  k  is  the  index  of  the  Markov  chain  repre- 
senting the  inhomogeneous  algorithm  and  K  is  independent  of  k.  The  following  is  a  brief 
sketch  of  the  approach  taken  to  arrive  at  these  results.  It  is  common  to  both. 
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Given  that  Gjj(T)  and  A,j(T)  are  selected  as  in  Eq.  2.4  and  2.5,  each  state  transition 
matrix  in  the  inhomogeneous  Markov  chain  of  the  inhomogeneous  algorithm  satisfies  all 
of  the  sufficient  conditions  for  stationary  distribution  existence  and  asymptotic  conver- 
gence to  optimality  developed  for  the  homogeneous  algorithm  (i.e.  irreducibility,  aper- 
iodicity  and  Eq.  2.8-2.10).  Further,  the  explicit  form  of  the  resulting  stationary 
distribution  is  given  by  Eq.  2.1 1.  Thus,  for  each  transition  matrix,  Pj ),  there  exists  an 
eigenvector,  q-r^,  having  eigenvalue  1  and  satisfying  the  probability  vector  conditions. 
Further,  q-r^  converges  to  the  limiting  distribution  of  Eq.  2.6  as  T^  -^  0.  Consequently, 
Theorem  A7  can  be  used  to  establish  strong  ergodicity  (and  hence  the  desired  conver- 
gence behavior  for  T^  -^  0)  provided  (1)  that  weak  ergodicity  can  be  established  and  (2) 
that  the  inequality  appearing  in  Theorem  A7  obtains. 

Under  the  hypothesis  that  Gij(T)  and  Aij(T)  are  defined  in  accordance  with  Eq.  2.4 
and  2.5,  in  which  case  the  required  eigenvector  is  explicitly  provided  by  Eq.  2.1 1,  and 
that  condition  (1)  (weak  ergodicity)  is  satisfied,  both  [GeGe84)  and  [MiRo85J  prove  con- 
dition (2)  of  the  above.  The  development  is  straightforward  but  tedious.  Of  more  interest 
here  is  the  means  of  establishing  condition  (1),  because  it  leads  to  the  annealing  schedule 
bound. 

Both  developments  employ  Theorem  A6  to  establish  weak  ergodicity.  The  general 
approach  is  to  use  the  definitions  of  Gjj(T)  and  Aij(T),  along  with  bounds  on  the  extrema 
of  either  the  cost  function  [GeGe84|  or  the  slope  of  the  cost  function  [MiRo85|  to  define 
bounds  on  the  one  step  transition  probabilities.  The  transition  probability  bound  is  then 
employed  to  arrive  at  an  upper  bound  on  the  x,  coefficient  of  ergcxiicity  of  Theorem  A5, 
which  is  used  in  turn  in  Theorem  A6  to  deduce  a  sufficient  condition  to  guarantee  weak 
ergodicity.  The  condition  is  in  the  form  of  a  lower  bound  on  the  annealing  schedule. 
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The  first  such  result  to  be  published  is  in  [GeGe84].  The  resulting  bound  is 

„        N  X  (C„„  -  Cn,  J 

^k  -  ] — TT^ Eq.  2.12 

log(k) 

k>2 
where  C„„  and  C^^„  are  the  maximum  and  minimum  values  respectively  of  C(i)  for  i  e  S 
and  N  =  card(S).  Thus,  C^  is  the  desired  Qp,. 

The  annealing  schedule  bound  established  in  [MiRo85]  is  more  refined  than  that  of 
Eq.  2.12.  It  is  given  by 

Tk^r-TTT  Eq.2.13 

log(k)  ^ 

k>2 
where  r  is  the  radius  of  the  graph  defining  the  accessible  state  neighborhoods  of  the  next 
state  generation  mechanism  (i.e.  the  {S,}  where  Sj  c  S  is  defined  in  Eq.  2.5),  and  L  is  a 
constant  which  bounds  the  local  slope  of  the  cost  function.  Specifically,  r  and  L  are  given 
by 

r=    min    maxd(i,j)  Eq.  2.14 

i^S-S„„   J€S 

where  d(i,j)  is  the  distance  of  j  from  i,  measured  by  the  minimum  number  of  state  trans- 
itions required  to  arrive  at  j  starting  at  i,  where  S^,,  c  S  is  the  set  of  local  maxima  of  C 
and 

L=  max  max  |C(j)-C(i)|.  Eq.  2.15 

i  €  S       j  €  S, 

Note  that  in  the  special  case  S,  =  S  for  all  i  €  S,  then  Eq.  2.14  and  Eq.  2.15  reduce  to  r=  1 
and  L  =  C„„  -  C„i„  respectively,  and  substitution  into  Eq.  2. 1 3  yields 

n- — \ — 77\ — •  Eq.  2.16 

log(k)  ^ 

The  Eq.  2.16  result  is  smaller  than  that  of  Eq.  2.12  by  the  factor  1/N. 
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Both  of  these  published  convergence  results,  as  well  as  several  others  which  are 
minor  variations  of  them,  are  of  the  general  form  VJ  log(k).  This  behavior  is  the  key  lim- 
itation of  the  algorithm  class,  and  is  believed  to  be  a  fundamental  limitation  imposed  by 
the  neighborhood  system  inherent  in  the  conventional  simulated  annealing  state 
generation  mechanism  [GeGe84]  (i.e.  the  fact  that  at  low  control  parameter  values,  the 
likelihood  of  making  the  large  state  transition  necessary  to  escape  a  local  extremum  is 
radically  diminished).  The  simulated  annealing  literature  includes  some  amount  of  specu- 
lation concerning  state  generation  mechanisms  which  permit  occasional  large  transitions 
even  at  low  control  parameter  values. 


SECTION  3 
THE  GENETIC  ALGORITHM 

3.1  Overview 

The  genetic  algorithm  is  an  iterative  improvement  stochastic  search  method  appro- 
priate for  application  to  combinatorial  optimization  problems  and  based  on  the  evolution 
of  biological  systems.  It  implements  the  fundamental  idea  of  survival  fitness  on  a 
population  of  string  structures  which  are  coded  representations  of  solution  candidates 
selected  from  the  solution  space  of  the  optimization  problem.  The  population  of  candi- 
date solutions  (which  collectively  represent  the  current  estimate  of  the  optimum  solution) 
is  subjected  to  a  set  of  stochastic  genetic  operators  which  transform  a  current  population 
into  a  new  (descendent)  population.  A  variety  of  distinct  genetic  operators  (based  on  bio- 
logical analogs)  are  available  and  are  reported  in  the  literature  [Davi87,  Gold89a,  Gref85, 
Gref87].  The  most  important  of  them  are  (1)  proportional  reproduction,  (2)  crossover  and 
(3)  mutation.  A  one,  two  or  three  operator  genetic  algorithm  employing  combinations  of 
these  operators  with  fixed  population  size  is  referred  to  herein  as  a  simple  genetic  algo- 
rithm. 

The  genetic  operators  are  all  implemented  stochastically,  but  they  do  not  result  in  a 
simple  random  walk  through  the  search  space.  They  represent  a  highly  structured  search 
which  exploits  the  historical  record  of  performance  reflected  at  each  stage  of  the  search 
by  the  current  population.  It  is  the  novel  use  of  this  historical  record  which  is  central  to 
the  appeal  of  the  genetic  algorithm. 

Genetic  algorithms  usually  operate  on  populations  of  bit-strings  (i.e.  the  optimiza- 
tion problem  is  usually  coded  such  that  its  search  space  is  defined  over  a  binary  string 
alphabet),  and  they  always  attempt  to  maximize  some  strictly  nonnegative  objective 
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function.  The  evolution  of  the  fixed  size  population  of  candidate  solutions  toward  domi- 
nation by  optimal  solutions  is  the  algorithm  goal. 

The  three  genetic  operators  of  a  simple  genetic  algorithm  are  discussed  in  the  next 
subsection.  An  analysis  of  their  behavior  requires  introduction  to  the  concept  of  sche- 
mata, or  similarity  templates,  and  that  task  is  undenaken  in  a  subsequent  subsection.  This 
section  concludes  with  an  assessment  of  the  theoretical  foundation  available  for  the 
analysis  of  genetic  algorithms. 

3.2  The  Simple  Genetic  Algorithm  Operators 

As  noted  above,  the  simple  genetic  algorithm  employs  three  biologically  inspired 
operators  to  transform  each  population  of  candidate  solutions  into  a  new  (descendent) 
population.  The  following  subsections  examine  each  of  these  operators  and  how  they 
influence  the  search  evolution. 
3.2.1  Reproduction 

The  genetic  algorithm  reproduction  operator  is  the  algorithmic  analog  of  asexual 
reproduction.  It  is  the  means  by  which  the  objective  function  influences  the  evolution  of 
the  genetic  algorithm  search.  It  is  implemented  by  evaluating  each  member  of  the  current 
generation  against  the  objective  function  and  using  the  results  to  measure  relative  repro- 
ductive fitness  (i.e.  to  provide  a  selection  probability  measure).  Then,  members  of  the 
current  population  are  selected  in  accordance  with  this  fitness  measure  to  be  members  of 
the  succeeding  generation.  This  process  is  repeated  (with  statistically  independent  selec- 
tion trials)  until  the  entire  new  generation  is  populated. 

In  the  absence  of  the  other  genetic  operators,  the  reproduction  operator  tends  to 
force  the  population  to  converge  to  the  higher  performing  members  of  the  current  popula- 
tion. It  eventually  produces  a  uniform  population.  At  any  stage  of  the  search  (generation), 
only  solutions  which  are  represented  by  members  of  the  current  population  can  appear  in 
any  succeeding  generation.  In  particular,  no  solution  absent  from  the  initial  population  is 
ever  attainable.  The  reproduction  operator  exerts  a  strictly  converging  infiucnce  on  the 
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search  evolution.  The  other  operators  of  the  simple  genetic  algorithm  circumvent  this 
limitation  in  a  controlled  manner. 
3.2.2  Crossover 

The  crossover  operator  in  a  genetic  algorithm  is  the  algorithmic  analog  of  sexual 
reproduction.  It  produces  the  succeeding  generation  not  by  simply  replicating  the  fittest 
members  of  the  current  generation  but  by  mating  the  fittest  members  of  the  current  gener- 
ation to  produce  progeny  with  some  of  the  "genetic"  character  of  each  parent.  It  is 
implemented  by  randomly  exchanging  parts  of  the  strings  representing  the  parents  to 
produce  descendent  strings. 

The  crossover  operator  is  implemented  (with  some  given  probability,  p^)  after  the 
reproduction  operator  has  been  invoked  to  select  two  reproducing  parents.  A  string  loca- 
tion is  randomly  selected  (usually  with  uniform  selection  probability)  and  the  parent  bit- 
string  on  each  side  of  the  randomly  selected  location  are  exchanged  to  produce  two 
progeny,  which  are  then  inserted  into  the  succeeding  population.  This  operation  is 
repeated  until  the  new  generation  is  completely  populated. 

The  crossover  operator  permits  strings  not  represented  in  the  current  population  to 
be  generated  in  the  succeeding  population.  That  is,  certain  points  in  the  solution  space 
which  are  not  represented  in  the  current  generation  can  be  present  in  the  successor  gener- 
ation. But  the  crossover  operator  is  applied  preferentially  to  high  performance  members 
of  the  current  population,  so  it  constitutes  a  judicious,  informed  tendency  toward 
population  divergence.  This  is  the  novel  feature  contributed  by  the  crossover  operator. 

Even  with  the  addition  of  crossover,  the  genetic  algorithm  search  will  eventually 
converge  to  a  uniform  population.  In  general  the  crossover  operator  causes  a  greater  por- 
tion of  the  search  space  to  be  explored  prior  to  convergence  to  uniformity,  but  for  a  given 
initial  population,  there  are  still  unreachable  points  in  the  solution  space.  Further,  even  if 
a  high  performance  solution  is  accessible  from  the  initial  population,  some  portion  of  the 
"gene  pool"  necessary  to  reach  it  can  be  irrevocably  lost  during  the  search  evolution. 
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3.2.3  Mutation 

The  mutation  operator  is  applied  to  each  member  of  the  successor  generation 
created  by  the  reproduction  and  crossover  operators.  It  simply  consists  of  randomly  per- 
turbing each  descendent  string  with  some  (usually  very  small)  perturbation  probability, 
Pm.  The  operator  exerts  a  diverging  influence  on  the  search  algorithm,  and  it  provides  a 
means  by  which  the  search  can,  with  some  nonzero  probability,  always  arrive  at  any  point 
in  the  solution  space.  That  is,  no  part  of  the  "gene  pool"  is  ever  permanently  extinguished 
if  the  mutation  operator  is  implemented.  Clearly,  it  is  analogous  to  mutation  in  biological 
reproduction.  Note  also  that  if  p^  >  0,  the  mutation  operator  precludes  the  algorithm  from 
ever  producing  a  permanently  uniform  population  (i.e.  it  precludes  algorithm  conver- 
gence). 

3.3  Building  Blocks.  Schemata  and  the  Fundamental  Theorem 

The  underlying  premise  of  the  genetic  algorithm  operators  is  that  good  solutions  to 
an  optimization  problem  over  a  bit-string  solution  space  are  composed  of  locally  good 
substrings,  and  that  assembling  combinations  of  such  locally  good  substrings  is  an  effec- 
tive way  to  search  the  space  for  globally  good  solutions.  In  the  genetic  algorithm  litera- 
ture, this  is  referred  to  as  the  building  block  hypothesis.  For  a  problem  to  be  amenable  to 
genetic  algorithm  solution,  this  hypothesis  should  apply.  In  the  genetics  parlance,  this 
hypothesis  is  stated  as  a  requirement  that  the  problem  exhibit  "...some  but  not  too  much 
epistasis"  [Davi87|.  The  next  subsection  introduces  an  idea  which  helps  to  place  this 
hypothesis  on  a  more  analytical  basis,  but  the  results  are  still  incomplete. 
3.3.1  Schema  Defined 

Let  the  solution  space  under  consideration  be  the  set  of  binary  strings  of  length  L, 

(i.e.  S  =  {(),  1 }' ).  Then,  a  schema  (plural  schemata),  designated  H,  is  a  subset  of  S  having 
the  property  that  every  member  of  H  matches  at  some  specified  set  of  defining  bit  loca- 
tions. Thus,  if  L  =  5,  then  the  schema  H  might  be  the  set  of  length  5  bit-strings  which 
match  the  string  (1,0, 1,0,0)  at  the  bit  locations  indicated  by  H  =  {s:s  =  (1,*,*,0,  *)}.  in 
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which  the  asterisks  indicate  "don't  care"  bits.  The  bit  locations  at  which  the  schema  is 
specified  are  the  defining  locations  of  the  schema.  The  order  of  the  schema,  designated 
by  o(H),  is  the  number  of  its  defining  locations  and  can  range  from  0  to  L.  In  this  exam- 
ple, o(H)  =  2.  The  defining  length  of  the  schema,  designated  5(H),  is  the  number  of  bit 
positions  subtended  by  its  outermost  defining  bit  locations  minus  1.  In  this  example, 
5(H)  =  5 -2  =  3. 

For  a  bit-string  space  of  length  L,  there  are  exactly  3^  distinct  schemata.  This  can  be 
readily  determined  by  noting  that  the  distinct  schemata  are  selected  from  {0, 1,  *}''.  A 
given  string  selected  from  the  space  represents  exactly  2^  distinct  schemata.  This  results 
from  the  fact  that  the  string  is  defined  at  all  L  bit  positions,  and  hence  is  selected  from 
{0, 1}  .  The  schemata  of  an  optimization  problem's  search  space  are  the  building  blocks 
from  which  good  solutions  are  to  be  constructed. 
3.3.2  Schema  Processing  and  the  Fundamental  Theorem 

Let  the  constant  population  size  of  a  simple  genetic  algorithm  be  designated  M. 

Then,  each  generation  produced  by  the  algorithm  represents  some  number,  N,  of  distinct 
schemata  that  is  bounded  as  follows 

2^<N<Mx2^  Eq.  3.1 

The  lower  bound  obtains  when  all  M  members  are  identical,  and  the  upper  bound  repre- 
sents a  limit  on  schema  diversity  supported  by  the  specified  population  size. 

Now,  briefly  recalling  the  mechanisms  implemented  by  the  three  simple  genetic 
operators,  it  is  possible  to  begin  understanding  their  influence  on  the  search  evolution.  In 
particular,  the  reproduction  operator  tends  to  reduce,  never  increase,  the  number  of  dis- 
tinct schemata  present  in  succeeding  generations  by  selectively  reproducing  strings  which 
are  reaHzations  of  above  average  fitness  schemata  to  the  exclusion  of  below  average 
competitors  at  the  same  set  of  defining  locations. 

The  crossover  operator,  on  the  other  hand,  tends  to  produce  new  schemata  by 
assembling  high  performance  low  order  schemata  in  new  combinations  at  the  expense  of 
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disrupting  high  order  high  performance  schemata.  The  extent  of  population  divergence 
introduced  by  the  crossover  operator  is  determined  in  part  by  the  degree  of  schema  diver- 
sity present  in  the  current  population.  In  particular,  when  the  population  becomes  uni- 
form, the  crossover  operator  is  nullified,  because  assembling  substrings  extracted  from 
identical  strings  produces  identical  progeny. 

The  mutation  operator  also  provides  a  disruptive  mechanism  which  resists  the  con- 
verging influence  of  the  reproduction  operator.  Since  any  schema  can  be  produced  by 
mutation  with  nonzero  probability,  the  permanent  extinction  of  any  of  the  3^  possible 
distinct  schemata  is  precluded. 

These  ideas  are  captured  in  the  following  inequality,  which  is  referred  to  in  the  liter- 
ature as  the  Fundamental  Theorem  of  Genetic  Algorithms.  It  relates  the  number  of  copies 
of  a  particular  schema  in  the  current  generation  to  the  expected  number  of  copies  of  the 
same  schema  in  the  succeeding  generation.  This  inequality  is  derived  in  [Gold89a]  from 
relatively  simple  probability  notions.  The  development  is  not  repeated  here. 

R(H)- 


E{m(H,k-(-l)  )  >  m(H,k)x 


R 


Eq.  3.2 


[l-PcX^^-p.xo(H)] 

where  m(H,  k)        =    number  of  occurrences  of  schema  H  in  the  population  at 

generation  k, 

E{}  =    expected  value  operator, 

R(H)  =    average  objective  function  value  (>  0)  of  all  strings  in 

the  current  population  which  are  realizations  of  H, 

R  =    the  average  objective  function  value  of  the  current  pop- 

ulation. 
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Equation  3.2  is  an  inequality  because  it  does  not  consider  the  accretion  of  the 
schema  H  contributed  by  crossover  and  mutation.  It  only  accounts  for  the  disruptive 
effects  of  these  operators.  A  more  thorough  treatment  can  be  found  on  pp  9- 1 3  of 
[Gref87],  but  the  result  is  too  cumbersome  to  be  of  much  analytical  value. 

Qualitatively,  Eq.  3.2  suggests  that  low  order  schemata  occurring  in  the  current 
population  contribute  to  succeeding  generations  in  direct  proportion  to  the  product  of 
their  number  in  the  current  generation  and  their  average  performance  relative  to  the  other 
schemata  competing  for  dominance  of  the  same  set  of  defining  locations.  Crossover  and 
mutation  tend  to  disrupt  this  converging  influence,  and  the  disruptive  effect  of  crossover 
is  directly  proportional  to  the  defining  length  of  the  schema  in  question. 

In  view  of  Eq.  3.2,  the  building  block  hypothesis  might  be  restated  as  a  characteris- 
tic of  genetic  algorithm  amenable  optimization  problems.  A  GA  amenable  problem  is  one 
for  which  a  near  optimum  solution  can  be  achieved,  with  a  relatively  small  expenditure  of 
search  effort,  by  assembling  high  performance,  low  order  schemata  into  novel  combina- 
tions. If  the  objective  function  is  such  that  (nonlinear)  contributions  from  combinations  of 
bits  spanning  widely  separate  bit  locations  are  appreciable  (i.e.  if  the  objective  function 
depends  heavily  on  large  defining  length  schemata),  then  the  problem  is  not  likely  to  be 
suitable  for  solution  by  genetic  algorithm.  On  the  other  hand,  if  the  objective  function 
depends  predominantly  on  short  defining  length  schemata,  then  sorting  through  promis- 
ing combinations  of  reahzations  of  those  schemata  is  likely  to  isolate  good  (though  not 
necessarily  optimal)  solutions.  Accomplishing  the  required  sorting  efficiently  is  the  task 
for  which  genetic  algorithms  are  well  suited. 

3.4  An  Assessment  of  the  Genetic  Algorithm  Theoretical  Foundation 

The  existing  theoretical  foundation  for  analysis  of  genetic  algorithms  includes  the 
fundamental  theorem  of  genetic  algorithms  (Eq.  3.2)  originally  enunciated  by  Holland 
[Holl75]  and  extended  by  Bridges  and  Goldberg  [BrGo87],  the  Walsh  function  approach 
to  computing  schema  fitness  averages  contributed  by  Bethke  [BethSO]  and 
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generalizations  of  it  [Gold88,  Gold89b,  Br<jo89],  a  result  concerning  selection  of  the 
optimal  population  size  for  the  algorithm  [Gold85]  in  terms  of  the  solution  space  dimen- 
sion and  the  examination  of  the  properties  which  make  a  problem  difficult  for  genetic 
algorithms  (the  so  called  minimal  deceptive  problem)  [Gold87,  Gold89bJ.  Also,  both  De 
Jong  [Dejo75]  and  Goldberg/Segrest  [GoSe87]  employ  Markov  chain  methodology 
accompanied  by  approximate  numerical  analysis  to  examine  cenain  specific  problems 
concerning  finite  length  chain  behavior  (e.g.  genetic  drift  in  a  binary  allele  genetic  algo- 
rithm). 

No  complete  theoretical  model  exists  for  describing  the  operation  of  the  simple 
genetic  algorithm  executing  on  a  specified  optimization  problem.  The  central  theme  of 
the  work  underlying  this  paper  is  an  attempt  to  develop  such  a  model  based  upon  the 
asymptotic  behavior  of  a  Markov  chain  which  represents  the  algorithm. 


SECTION  4 

A  MARKOV  CHAIN  MODEL  OF  THE  SIMPLE  GENETIC  ALGORITHM 

4.1  Overview 

From  the  discussion  of  the  simple  genetic  algorithm  operators  in  Section  3.2,  it  is 
clear  that  the  sequence  of  populations  generated  by  the  algorithm  when  executing  on  a 
specified  combinatorial  optimization  problem  is  a  stochastic  process  (with  finite  state 
space),  and  further  that  the  conditional  dependence  of  each  population  in  the  sequence  on 
its  predecessors  is  completely  described  by  its  dependence  upon  the  immediate  predeces- 
sor population.  Thus,  the  sequence  is  a  Markov  chain  (Definition  Al).  In  this  section,  a 
nonstationary  Markov  chain  model  of  the  simple  genetic  algorithm  is  developed  for  one, 
two  and  three-operator  variants  of  the  algorithm.  The  model  is  tailored  to  resemble  that 
offered  in  Section  2.4.1  for  simulated  annealing.  The  one-operator  genetic  algorithm 
model  implements  proportional  reproduction  only,  while  the  two-operator  variant 
employs  reproduction  in  combination  with  mutation.  The  three-operator  algorithm  imple- 
ments reproduction,  mutation  and  crossover.  This  model  hierarchy  is  employed  because  it 
provides  some  degree  of  insight  into  the  effect  that  each  operator  has  on  the  nature  of  the 
state  space  of  the  resulting  Markov  chain. 

Describing  and  analyzing  the  operation  of  the  simple  genetic  algorithm  is  facilitated 
by  assuming  that  the  underlying  optimization  problem  is  defined  over  a  bit-string  solution 
space.  This  assumption  is  not  essential  and  sacrifices  very  little  generality.  It  is  implem- 
ented throughout  the  following  sections. 

4.2  The  Markov  Chain  Model 

Let  a  combinatorial  optimization  problem  be  characterized  by  the  pair  (S,R)  where 
S={0,1  }^  and  R  is  a  strictly  positive  real  valued  reward  function,  and  assume,  with  no 
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loss  of  generality,  that  the  problem  requires  maximization  of  R.  Also,  let  a  simple  genetic 
algorithm  designed  to  execute  on  this  problem  have  fixed  population  size  M,  let  i  e  S  be 
interpreted  as  an  unsigned  integer  (0  <  i  <  2^"-  1),  and  let  a  generation  be  represented  by 
m  =  (m(0),  m(  1  ),■••,  m(2  -  1 ))  where  m(i)  =  the  number  of  occurrences  of  solution  i  e  S 
in  the  population.  Thus,  in  the  parlance  of  combinatorial  mathematics,  m  is  a  distribution 
of  M  nondistinct  objects  over  N  =  card(S)  =  2^  bins  [Hall67,  Rior58],  and  the  set  of  all 
such  distributions,  S'  =  {m},  is  a  suitable  representation  of  the  simple  genetic  algorithm 
search  space.  The  cardinality  of  S'  is  given  by 


N'  =  card(S')  = 


^M  +  2^-1^ 


/"m  +  n-T 


M        J  '^'•■^■' 


M       ^      ^ 
Since  both  N  and  M  are  finite,  so  is  N'. 

Then,  if  mg  e  S'  is  selected  as  an  initial  population,  the  simple  genetic  algorithm 
can  be  represented  by  the  quadruple  (S',mo,PQ,r)  where  Pq  is  a  state  transition  matrix 
(analogous  to  P^  of  the  simulated  annealing  model)  and  F  =  {Q^}  is  a  finite  length 
sequence  of  parameter  vectors  Q^  =  (Pm(k),  Pc(k))-  The  algorithm  parameters  Pn,(k)  and 
Pc(k)  are  respectively  the  mutation  and  crossover  probabilities.  In  the  following  sections, 
the  mutation  probability  sequence  is  employed  in  a  role  analogous  to  absolute  tempera- 
ture in  simulated  annealing,  and  consideration  is  limited  hereafter  to  monotone  nonin- 
creasing  sequences.  In  general,  the  only  limitation  on  the  crossover  probability  sequence 
is  that  its  values  are  probabilities.  However,  in  all  of  the  following,  consideration  is 
limited  to  constant  crossover  probability  sequences. 

The  first  parameter  vector  in  V  is  Q,,  and  the  final  piU'ameter  vector  is  Qf.  The  solu- 
tion evolves  as  a  sequence  {m,j}  of  states  m,^  e  S'  in  which  the  conditional  dependence  of 
m|(  +  ,  on  the  sequence  history  is  equivalent  to  its  conditional  dependence  on  m^,  and  thus 
the  solution  sequence  is  a  Markov  chain.  In  general,  the  chain  is  inhomogeneous  (Defini- 
tion A5).  In  Section  4.3  it  is  shown  to  be  time-homogeneous  if  the  parameter  vectors  are 
constant.  As  with  the  simulated  annealing  algorithm  model,  exhausting  the  sequence  of 
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control  parameter  values,  r,  signals  algorithm  termination,  and  q  can  be  allowed  to 
depend  on  {m^}  provided  the  algorithm  termination  requirement  is  satisfied. 
4.3  State  Behavior  of  the  Simple  Genetic  Algorithm 

In  each  of  the  next  three  subsections,  the  state  transition  mechanism  (and  its  effect 
on  the  nature  of  the  solution  sequence)  which  results  from  employing  a  specified  combi- 
nation of  the  genetic  algorithm  operators  to  the  Markov  chain  model  is  examined.  The 
first  case  consists  of  a  one-operator  algorithm  which  employs  only  the  reproduction 
operator.  The  second  is  a  two-operator  algorithm  which  employs  reproduction  and  muta- 
tion. Finally  a  three-operator  algorithm  which  includes  crossover  with  reproduction  and 
mutation  is  examined. 

Although  it  is  most  natural  to  describe  the  genetic  operators  in  the  order  reproduc- 
tion/crossover/mutation, the  course  adopted  in  Section  3.2,  the  following  development 
proceeds  most  instructively  if  mutation  is  included  with  reproduction  in  the  two-operator 
algorithm  and  crossover  is  deferred  to  the  three -operator  case.  This  is  due  to  the  fact  that 
the  mutation  operator  provides  the  essential  state  space  modification  required  to  make  the 
Markov  chains  of  the  time-homogeneous  two  and  three-operator  algorithms  irreducible 
(Definitions  A7  and  A8,  Theorem  Al),  and  consequendy  causes  them  to  have  unique  sta- 
tionary distributions  (Theorem  A3).  The  one-operator  algorithm  (proportional  reproduc- 
tion only)  does  not  satisfy  the  irreducibility  requirement  for  existence  of  a  unique 
stationary  distribution.  (Neither  does  the  algorithm  variant  which  employs  reproduction 
and  crossover  without  mutation).  A  unique  stationary  distribution  means  that  the  asymp- 
totic state  occupancy  probability  of  the  time-homogeneous  two  and  three-operator  algo- 
rithms is  completely  determined  by  the  algorithm  parameters  and  objective  function.  It  is 
independent  of  the  starting  state  (inidal  population).  Asymptotic  independence  of  the 
starting  state  is  a  necessary  (but  not  sufficient)  condition  on  the  zero  mutation  probability 
limit  of  the  stationary  distribution  of  the  time-homogeneous  algorithm  for  the  inhomoge- 
neous  algorithm  counterpart  to  avoid  (asymptotically)  local  minima  entrapment. 
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4.3.1  A  One-Operator  Algorithm  (Reproduction) 

In  this  subsection,  the  nature  of  the  state  transition  matrix  is  examined  for  the  case 
of  no  crossover  or  mutation  (i.e.  Q^  =  (0,0)  for  0  <  k  <  f).  In  this  case,  the  conditional 
probability  of  selecting  a  solution  i  e  S  from  a  population  described  by  the  state  vector 
n  e  S'  is  (i.e.  proportional  reproduction) 

n(i)xR(i) 


Vie  S,Vn£  S':P,(i|n)  = 


Xna)xR(j) 

J6S 


Eq.  4.2 


where  the  subscript  1  indicates  that  the  one-operator  case  is  under  consideration.  Thus, 
the  conditional  probability  of  the  successor  generation  described  by  m  given  that  the  pres- 
ent generation  is  described  by  n  is  a  multinomial  distribution,  i.e. 

M! 


Vm,n  €  S'  :  Pj(m  |  n)  = 


n  m(i)!     j€s 


i€  S 


M 


m 


M 


m 


X  n  P,(i  I  n)' 


::\">(') 


i€  s 


xn 

i€  S 


Eq.  4.3 


n(i)xR(i) 
InQxRO) 

J€S 


m(i) 


where  again  the  subscript  1  distinguishes  the  one-operator  case,  where  the  symbol 


M! 


rM' 


m  J     n  M(i)! 


Eq.  4.4 


i€  S 


designates  the  indicated  multinomial  coefficient  and  where  by  definition 


n(i)  =  0 


n(i)xR(i) 
Sn(j)xRa) 


m(i) 


1     m(i)  =  0 
0     m(i)  >  0 


Eq.  4.5 


The  transition  probability  matrix  of  the  Markov  chain  representing  the  one-operator  algo- 
rithm is  composed  of  the  array  of  conditional  probabilities  defined  by  Eq.  4.3,  i.e. 

P=|P,(m|n)J.  Eq.4.6 
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Since  it  is  independent  of  the  sequence  index  (i.e.  the  parameter  vectors  are  constant),  the 
one-operator  Markov  chain  is  time-homogeneous  (Definition  A5). 

The  set  of  states  which  represent  uniform  populations  (i.e.  the  states  m ^  e  S^'  c  S' 
in  which  one  component  is  M  and  all  others  are  zero)  are  absorbing  states  of  the  Markov 
chain,  because  for  any  such  state,  P,(mA  1 111^)  =  1  and  Definition  A6  applies.  Since  it  fol- 
lows from  Eq.  4.2-3  that  Vn  e  S'  -  S^' :  Pi(n  |  n)  <  1,  there  are  exacUy  N  =  2^-  absorbing 
states.  The  corresponding  rows  of  P  are  given  by 

-  _  _        fl  m  =  n» 

Vn^eS/:P,(m|nJ=         _     „,;-,.  Eq.  4.7 

[0     m£  S  -{n^} 

Thus,  for  each  state  n^  e  S^',  the  associated  row  of  the  state  transition  matrix  (Eq.  4.6) 
contains  1  in  the  principal  diagonal  location  and  0  elsewhere.  It  follows  that  the  N'  x  1 
probability  vector  q^^  (Definition  A2)  whose  n^  e  Sa'  component  is  1  is  a  stationary  dis- 
tribution (Definition  A 10)  of  the  one-operator  Markov  chain.  It  is  not  unique  because  any 
of  the  N  =  2^  such  vectors  satisfies  the  requirement,  as  does  any  vector  of  the  form 
q  =  _  I    ^  qn^where  -^^  >  0  and  I-)^  =  1 . 

The  absorbing  states  preclude  irreducibihty  (Theorem  Al),  so  the  Markov  chain 
does  not  satisfy  the  requirements  of  Theorem  A3.  The  chain  is  aperiodic  (Definition  A9) 
however,  because  Vm  e  S' :  Pi(m  |  iri)  >  0  so  the  period  of  all  states  is  1 .  Thus,  all  of  the 
conditions  of  Theorem  A3  except  irreducibihty  are  met  by  the  one-operator  Markov 
chain. 

The  expected  number  of  transitions  required  to  arrive  in  an  absorbing  state,  Efk^), 
is  finite.  An  upper  bound  on  Elk^}  is  given  by 


E{kJ  < 


<    00 


Eq.  4.8 
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where  R^,„  and  R^„  are  the  extreme  values  of  R.  (Recall  that  R  is  assumed  strictly  posi- 
tive, so  Rmaj  >  R,^  >  0).  Eq.  4.8  can  be  derived  by  defining  PaC^)  as  the  conditional 
probability  of  arriving  in  the  set  of  absorbing  states,  S^',  on  the  k-  transition  given  that 
the  k-  state  is  not  absorbing,  letting  Pn,i„  be  a  lower  bound  on  PA(k),  and  bounding  the 
series  for  Elk^}  as  follows 

E{kA}=Ikxp^(k)n(l-pA(l)) 

k  1=1 

<Xkx  n(l-pA(l))  Eq.4.9 

k  1  =  1 

k 

=  l/(P,„i„)'- 

Next,  note  that  a  suitable  bounding  value  on  p^i„  is 

r     R        "i"^ 

_       _J\iiin_ 

P™"~[mxR„, 

The  desired  bound  on  E{kA}  (Eq.  4.8)  is  then  obtained  by  using  Eq.  4.10  in  Eq.  4.9. 

It  is  noteworthy  that  the  above  absorbing  state  convergence  result  does  not  require 
any  assumption  on  the  range  of  R^^^  -  Rj„,„.  Even  when  the  objective  function  exerts  zero 
selective  pressure  (i.e.  Vi  e  S:R(i)  =  R„in  =  ^»x)^  the  finite  population  size  still  results  in 
convergence  to  an  absorbing  state.  In  the  genetics  parlance,  this  tendency  is  referred  to  as 
genetic  drift.  It  is  responsible  for  the  inevitable  convergence  of  the  one-operator  simple 
genetic  algorithm,  as  discussed  in  Section  3.2.1. 
4.3.2  A  Two-Operator  Algorithm  (Reproduction  and  Mutation) 

In  this  subsection,  the  nature  of  the  state  transition  matrix  is  examined  when  the 
mutation  operator  is  applied  with  some  probability  in  the  range  0  <  Pn,(k)  <  1  (i.e. 
Qk  =  (Pm(k),0)).  Let  P2(i  |  n)  and  P2(m  |  n)  be  the  conditional  distributions  of  the  two- 
operator  algorithm  corresponding  to  the  one-operator  distributions  defined  by  Eq.  4.2-4.5. 
Then,  P2(i  |  n)  and  P2(m  |  n)  must  account  for  the  effect  of  nonzero  p^.  This  can  be 


>  0.  Eq.  4.10 
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accomplished  by  expressing  P2(i  |  H)  as  a  sum  over  all  j  of  the  corresponding  P,(j  |  n) 
rimes  a  factor  which  accounts  for  the  probability  of  the  collection  of  mutation  events 
required  to  transform  j  into  i.  This  probability  can  be  expressed  as  Pm '"''(I  -  Pm)^~"^''^^ 
where  H(i,j)  =  H(j,i)  is  the  Hamming  distance  of  the  pair  i,j.  That  is,  H  is  a  function 
defined  on  S  xS  with  values  in  {0, 1,2, -sL}.  H(i,j)  is  the  number  of  bits  which  must  be 
altered  by  mutation  to  transform  i  into  j  and  L  -  H(i,j)  is  the  number  of  bits  which  must 
remain  unaltered.  Thus,  P2(i  |  n)  can  be  written  as 


VieS,Vn6S':P,(i|n)=Ip^<'-J\l-pJ^ 


=   (1-PJ'X 


jes    (1-pJ 


H(i.j) 


xP,(j|n)     Eq.4.11 


1 


(1  +  a)  j€s 


-Ia""'^'xP,0|H) 


where 


a  = 


(1-pJ 


and 


Pm  = 


a 


Eq.  4.12 


Eq.  4.13 


(1+a) 

For  p„=0  or  p„=l,  Eq.  4.1 1  includes  the  indeterminate  form  0°  in  some  terms.  Thus, 
the  admissible  range  of  Pn,  is  restricted  to  0  <  p^  <  1,  and  consequenUy  that  of  a  is 
0  <  a  <  oo.  However,  cases  corresponding  to  Pn,  >  1/2  <=>  a  >  1  are  of  no  practical  interest 
(they  are  less  random  than  the  case  p„  =  1/2  <=>  a  =  1),  and  some  of  the  following  devel- 
opments restrict  consideration  to  the  range  0<p„<l/2<^0<a<l. 

Subsrituring  Eq.  4.2  into  Eq.  4.1 1  yields 

P,(i|H)  =  — L-Ia"'-^>x     "^^^^^^     . 
'  (l+af-s^s  I  n(k)xR(k) 

k6  S 


It  is  also  straightforward  to  show  that 
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Zn(k)xR(k)=— ^_   _ 

keS  (l+a)j€Sk€S 


I   I  n(k)xR(k)xa"^-''l 


Thus,  PjCi  I  n)  can  be  expressed  as 


P,(i|n)  = 


I  nQ)  X  R(j)  X  a"<''J> 

JeS 

I  Z  n(k)  X  R(k)  X  a''^''^ 

j€  Sk€  S 

In(j)xR(j)xa"^''J> 
jgs 

(1+a)'-  Z  n(k)xR(k)' 

k€  S 


Eq.4.14 


and  PjCm  |  n)  is  multinomially  distributed  as  follows 


M! 


Vm,nG  S':P,(m|n)  =  — — — X  nPadln)' 

n  ni(l)!       ies 
i€  S 


m(i) 


(M^ 


yirxj 
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X  n  P2(i  I  n)' 


m(i) 


i£  S 


Eq.  4.15 
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i:na)xR(j)a' 

jeS 


naj)!""® 


mj    (l+a)^'-i€s       In(k)xR(k) 
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The  transition  probability  matrix  of  the  Markov  chain  representing  the  two-operator  algo- 
rithm is  composed  of  the  array  of  conditional  probabilities  defined  by  Eq.  4.15,  i.e. 

P=[P2(m|H)].  Eq.4.16 

Since  the  elements  of  P  depend  on  a  (and  hence  by  Eq.  4. 1 2  on  Pn,(k)),  the  two-operator 
Markov  chain  is  generally  not  time-homogeneous.  It  is  time-homogeneous  if  the  mutation 
probability  is  fixed. 

Eq.  4.14-4.16  for  the  two-operator  simple  genetic  algorithm  are  analogous  to  Eq. 
4.2-4.6  for  the  one-operator  variant  except  that  PjCi  |  n)  is  strictly  greater  than  zero  for  all 
n  e  S'.  Thus,  the  two-operator  analog  of  Eq.  4.5  is  not  required.  Also 
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and 


limP2(i|n)  =  Pi(i|n) 


lim  PjCm  I  n)  =  P,(m  I  n). 


Eq.  4.17 


The  rows  of  the  state  transition  matrix  corresponding  to  the  one-operator  absorbing 
states  have  an  especially  simple  form.  Let  i^e  S  be  the  solution  represented  in  the 
absorbing  state  n^  e  S^'.  Then,  from  Eq.  4.14, 

H(i.iJ 


P2(i|nJ  = 


M  X  R(iA)  X  a 
(l+a)^xMxR(iJ 


Eq.  4.18 


.H(i.iA) 
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Thus,  from  Eq.  4.15, 


P2(m|nJ  = 


/"mV' 


Sm(i)xH(i,i.) 


[^mj  (l+a)"^  ■ 

Since  the  reward  function,  R,  is  strictly  positive  by  hypothesis,  and  since 
Vi,j  €  S  :  0  <  H(i,j)  <  L,  it  follows  that  for  a  in  the  range  0  <  a  <  1,  then 

a'-  Z  nO)  X  RG)  <  S  n(j)  x  RQ)  x  a"<'-^>  <  I  n(j)  x  RQ), 

je  S  j€  S  j€  S 


Eq.4.19 


and  consequently  from  Eq.  4.14  that 
Vie  S,Vn€  S' : 
Using  Eq.  4.20  in  Eq.  4.15  yields 


^    a    > 


1  -i-a 


<P,(i|n)< 
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Eq.  4.20 
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Eq.  4.21 


From  the  lower  bound  in  Eq.  4.21,  the  final  requirement  of  Theorem  A3  (irreduc- 
ibility)  is  fulfilled  and  the  Markov  chain  for  the  time-homogeneous  two-operator  simple 
genetic  algorithm  possesses  a  unique  stationary  distribution,  q„,  given  by 
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Vme  S':(l)q„(m)>0 
(2)qir=l 

(3)^  =  ^  . 
Since  the  stationary  distribution  is  by  definition  a  left  eigenvector  of  the  state  transition 
matrix  (Definition  A 10),  it  follows  from  Eq.  4.15  and  4.16  that  the  asymptotic  state  prob- 
ability distribution  of  the  time-homogeneous  two-operator  algorithm  is  completely  deter- 
mined by  the  objective  function  and  the  algorithm  parameters.  It  is  independent  of  the 
starting  state,  mo. 
4.3.3  A  Three-Operator  Algorithm  (Reproduction.  Mutation  and  Crossover) 

The  three-operator  simple  genetic  algorithm  corresponds  to  the  case 
Vk:Qk  =  (p^(k),  p^(k))  with  both  Pm(k)  and  Pc(k)  nonzero.  Results  analogous  to  Eq. 
4.14-4.21  for  the  two-operator  case  are  obtainable  by  defining  a  new  function  which  is 
similar  in  character  to  the  Hamming  distance  function  employed  in  Section  4.3.2  for  the 
two-operator  case.  This  subsection  completes  that  generalization.  The  result  only  reflects 
the  crossover  operation  implicitly,  however  it  permits  some  very  significant  conclusions 
concerning  bounding  values  of  the  three  operator  conditional  probabilities. 

The  new  function,  I(i,j,k,s),  is  defined  over  an  ordered  quadruple  (i,j,k,  s)  where 
i,j,k€  S  and  where  s  €  {0, 1,---,L}  is  a  bit-string  location.  The  states  i,j  e  S  represent 
respectively  the  first  and  second  parent  strings  selected  at  a  particular  crossover  opportu- 
nity and  k  6  S  represents  a  possible  descendent  string.  The  bit-string  location  s  is  the 
location  randomly  selected  by  the  crossover  operator,  and  normally  it  is  unifomily 
distributed  over  its  range.  Thus,  I  is  defined  on  S  xS  xS  x{(),  1,2,  •■•,L}  and  it  takes  on 
values  selected  from  {0,1 }  depending  upon  whether  the  indicated  crossover  operation  is 
or  is  not  consistent.  That  is,  I  assumes  the  value  one  if  the  bit-string  k  is  produced  by 
crossing  the  bit-strings  i  and  j  at  the  site  s,  and  zero  otherwise. 
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In  terms  of  this  crossover  operator  function,  the  conditional  probability  of  produc- 
ing, via  reproduction  and  crossover,  a  solution  k  e  S  given  a  current  population  described 
by  n  e  S'  is 

P^'Ck  I H)  =  p,  X  I  I  P,(i  U)  X  P,(j  I H)  X  ^  I  I(i, j,  k,  s) 

i€  Sj€  S  L    s 

+(l-pJxP,(k|H)  Eq.4.22 

1  *=L  _  _ 

=  PeX-xI  I  lPi(i|n)xP,(j|n)xI(i,j,k,s) 

L,       ie  Sje  Ss=l 

+(l-pJxP,(k|H) 

where  Pi(i  |  n)  is  as  defined  as  in  Eq.  4.2  and  where  Pj'O  |  n)  refers  to  the  two-operator 
algorithm  consisting  of  reproduction  and  crossover  without  mutation.  This  result  assumes 
uniformly  distributed  crossover  site  selection. 

The  array  of  conditional  probabilities  [P2'(i  I  n)]  plays  a  role  in  the  three-operator 
simple  genetic  algorithm  very  analogous  to  the  role  played  by  the  array  [P,(i  |  n)]  in  the 
two-operator  variant.  In  fact,  the  [P2'(i  I  n)]  array  can  be  used  as  counterparts  of  Eq.  4.2  to 
develop  results  exactly  analogous  to  Eq.  4.3  and  Eq.  4.6.  Further,  for  n  e  Sa',  Eq.  4.22 
reduces  to 

P2'(k|Hj  =  P,(k|Hj,  Eq.4.23 

and  consequently  this  (fictitious)  two-operator  algorithm  (reproduction  and  crossover) 
demonstrates  the  same  sort  of  absorbing  state  behavior  as  the  one-operator  algorithm. 

From  Eq.  4.22,  the  three-operator  conditional  probabilities  and  state  transition 
matrix  are  expressible  as 

P3(i  I  H)  =  — ^  X  Z  a"^''J)  x  P,'Q  I  H),  Eq.  4.24 

(1-ha)         jeS 
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P3(m|n)  = 
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and 


P  =  [P3(m|n)].  Eq.4.26 

These  results  are  developed  in  a  fashion  analogous  to  Eq.  4.14-4.16.  From  them,  it  fol- 
lows that  the  three-operator  Markov  chain  is  time-homogeneous  if  both  the  mutation  and 
crossover  probabilities  are  fixed.  In  general  it  is  not  time-homogeneous. 
From  Eq.  4.22, 4.24  and  4.25,  it  follows  that 


lim  P3(i  I  n)  =  Pj'O  |  n) 


a->0* 


and 


lim  P3(m  I  n)  =  P^'Cm  |  n). 


a->0* 


Also,  from  Eq.  4.23-4.25,  the  three-operator  analogs  of  Eq.  4.18-19  apply 

H(.,i^) 


P3(i  I  "a)  = 


a 


(1+ar 


P3(m|nJ  = 


'M  la 


\yc\)  (H-a) 


ML 


Additionally,  since 


a^  I  P^'G  I  n)  <  I  P^'O  I  n)  x  a""'^'  <  I  P^'O  I  n), 

je  S  j€  S  j€  S 

the  three -operator  analogs  of  Eq.  4.20-21  follow  from  Eq.  4.24-25,  i.e. 

1    ^•- 
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,  1  -i-a 


<P3(i|n)< 


1  -i-a 


and 


Vm,n  e  S'  : 


^mV    a    ^"' 
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<P3(m|n)< 
1  +  a  i  I  ni 


,  I  +a 
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Eq.  4.27 


Eq.  4.28 


Eq.  4.29 


Eq.  4.30 


Eq.  4.31 
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All  of  the  state  space  characteristics  described  in  4.3.2  for  the  two-operator  algo- 
rithm follow.  In  particular,  the  Markov  chain  of  the  three-operator  algorithm  is  irreduc- 
ible. Thus,  a  unique  stationary  distribution  exists  for  the  time-homogeneous 
three-operator  simple  genetic  algorithm,  and  as  in  the  two-operator  case  it  is  completely 
determined  by  the  objective  function  and  the  algorithm  parameter  values. 
4.3.4  Summary 

The  asymptotic  behavior  of  the  one-operator  simple  genetic  algorithm  is  dominated 
by  the  states  which  correspond  to  uniform  populations,  the  one-operator  absorbing  states. 
The  algorithm  necessarily  arrives  at  some  member  of  the  absorbing  state  set  within  a 
finite  number  of  algorithm  iterations  (Eq.  4.8).  The  asymptotic  probability  distribution 
depends  upon  the  algorithm  initial  population,  mo.  This  observation  is  equivalent  to  the 
fact,  established  in  Section  4.3.1,  that  the  stationary  distribution  of  the  one-operator  algo- 
rithm is  not  unique. 

A  unique  stationary  distribution  exists  for  the  time-homogeneous  two  and 
three-operator  algorithm  variants  (with  a  >  0),  or  equivalently,  their  asymptotic  probabil- 
ity distributions  are  independent  of  nio.  However,  in  the  a  ^  0^  limit,  both  the  two  and 
three-operator  algorithms  degenerate  into  the  absorbing  state  behavior  which  typifies  the 
one-operator  case  (Eq.  4.17  and  Eq.  4.23,  4.27).  A  very  important  question  is  whether  the 
unique  stationary  distributions  of  the  two  and  three-operator  algorithms  approach  limits 
as  a  ^  0^.  Section  7  answers  that  question  affirmatively,  and  in  Section  8,  the  lower 
bounds  reflected  in  Eq.  4.21  and  Eq.  4.31  are  employed  to  arrive  at  a  monotone  decreas- 
ing sequence  bound  on  Pm(k)  sufficient  to  guarantee  that  the  limiting  distribution  is 
achieved  (asymptotically)  by  the  inhomogeneous  two  and  three-operator  Markov  chains. 

The  analogous  conditional  probability  arrays  [P,(i  |  n)]  and  [P2'(i  I  n)],  whose  ele- 
ments are  defined  by  Eq.  4.2  and  Eq.  4.22  respectively,  play  a  very  essential  role  in  the 
following  sections,  especially  in  Section  9.  Most  of  the  results  developed  hereafter  apply 
equally  to  the  two  and  three-operator  algorithm  variants  by  substituting  from  these 
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conditional  probability  arrays  appropriately.  Thus,  in  much  of  the  following,  the  notation 
modifiers  are  suppressed,  so  that  the  elements  of  either  of  these  arrays  are  denoted  by 
P(i  I  n),  with  the  specific  array  reference  being  determined  by  context. 


SECTION  5 
SOME  EMPIRICAL  RESULTS 

5.1  Overview 

This  section  reports  the  results  of  some  computer  simulations  based  upon  the 
genetic  algorithm  Markov  chain  model  developed  in  Section  4.  Their  purpose  is  to  help 
fix  some  of  the  state  space  and  asymptotic  probability  distribution  ideas  which  are  central 
features  of  this  work. 

The  results  reported  here  are  separated  into  four  subsections.  Section  5.2  concerns 
enumeration  of  the  state  space,  S'.  Section  5.3  is  devoted  to  generation  of  reward  function 
data,  which  are  subsequently  used  in  the  two  remaining  subsections.  Section  5.4  illus- 
trates the  behavior  of  some  selected  conditional  probabilities  as  a  function  of  the  algo- 
rithm control  parameter,  a.  The  results  of  the  primary  simulation  task  are  reported  in 
Section  5.5.  They  concern  computation  of  the  three-operator  stationary  distribution  at 
extremely  low  (approaching  zero)  values  of  the  mutation  probability  control  parameter. 

One  of  the  significant  theoretical  results  developed  in  subsequent  sections  is  sug- 
gested by  the  data  presented  in  Section  5.5.  It  is  that  the  zero  mutation  probability  limit- 
ing stationary  distribution  provides  nonzero  probability  for  all  states  corresponding  to 
uniform  populations  (i.e.  one-operator  absorbing  states),  including  those  which  represent 
suboptimal  solutions.  This  result  poses  a  complication  for  the  attempt  to  extrapolate  the 
simulated  annealing  convergence  theory  onto  the  genetic  algorithm,  as  discussed  further 
in  section  5.5. 

All  simulation  results  included  here  were  generated  on  the  Cray  Y-MP  computer  at 
the  Eglin  AFB,  Fl.  Computer  Science  Directorate.  The  data  presented  in  Section  5.5  con- 
cerning the  primary  simulation  task  (the  converged  limiting  stationary  distribution 
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results)  includes  some  CPU  utilization  statistics  which  reflect  the  approximately  180 
hours  of  CPU  time  expended  in  generating  that  data.  The  source  program  listings  for  the 
programs  employed  in  generating  the  results  of  this  section  are  included  in  Appendix  D. 

5.2  State  Space  Enumeration 

The  results  appearing  in  this  section  are  of  two  primary  types.  The  first  is  a  table  of 
computed  state  space  cardinality  values,  N',  at  a  variety  of  combinations  of  bit-string 
length,  L,  and  population  size,  M.  These  results  are  products  of  the  program  GET_NPS.F 
appearing  in  Appendix  D.  It  implements  Eq.  4.1.  The  results  are  collected  in  Table  5-1. 

In  addition  to  the  N'  column.  Table  5-1  includes  a  similar  column  labeled  N".  It 
denotes  the  cardinality  of  a  space  designated  S"  which  is  related  to  S'  and  whose  signifi- 
cance is  established  in  Section  9.  Its  cardinality  is  given  by 


N"  = 


M 


Eq.  5.1 


The  data  recorded  in  column  N"  of  Table  5-1  are  computed  from  this  equation  by  the 
program  GET_NPS.F. 

Table  5-1 
State  Space  Cardinality 


M 

L 

N 

N' 

N" 

1 

1 

2 

2 

3 

2 

4 

4 

5 

3 

8 

8 

9 

4 

16 

16 

17 

5 

32 

32 

33 

6 

64 

64 

65 

7 

128 

128 

129 

8 

256 

256 

257 

2 

I 

2 

3 

6 

2 

4 

10 

15 

3 

8 

36 

45 

4 

16 

136 

153 

5 

32 

528 

561 

6 

64 

2080 

2145 

7 

128 

8256 

8385 

8 

256 

32896 

33153 
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Table  5-1  (continued) 


M 

L 

N 

N' 

N" 

3 

1 

2 

4 

10 

2 

4 

20 

35 

3 

8 

120 

165 

4 

16 

816 

969 

5 

32 

5984 

6545 

6 

64 

45760 

47905 

7 

128 

357760 

366145 

8 

256 

2829056 

2862209 

4 

1 

2 

5 

15 

2 

4 

35 

70 

3 

8 

330 

495 

4 

16 

3876 

4845 

5 

32 

52360 

58905 

6 

64 

766480 

814385 

7 

128 

11716640 

12082785 

8 

256 

183181376 

186043585 

5 

1 

2 

6 

21 

2 

4 

56 

126 

3 

8 

792 

1287 

4 

16 

15504 

20349 

5 

32 

376992 

435897 

6 

64 

10424128 

11238513 

7 

128 

309319296 

321402081 

8 

256 

9525431552 

9711475137 

6 

1 

2 

7 

28 

2 

4 

84 

210 

3 

8 

1716 

3003 

4 

16 

54264 

74613 

5 

32 

2324784 

2760681 

6 

64 

119877472 

131115985 

7 

128 

6856577728 

7177979809 

8 

256 

414356272512 

424067747649 

7 

1 

2 

8 

36 

2 

4 

120 

330 

3 

8 

3432 

6435 

4 

16 

170544 

245157 

5 

32 

12620256 

15380937 

6 

64 

1198774720 

1329890705 

7 

128 

131254487936 

138432467745 

8 

256 

15508763342592 

15932831090241 

8 

1 

2 

9 

45 

2 

4 

165 

495 

3 

8 

6435 

12870 

4 

16 

490314 

735471 

5 

32 

61523748 

76904685 
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Table  5- 1  (continued) 


M 

L 

N 

N' 

N" 

6 

7 
8 

64 
128 
256 

10639125640 

2214919483920 

509850594887712 

11969016345 

2353351951665 

525783425977953 

The  second  set  of  state  space  enumeration  results  reported  here  is  a  listing  of  the 
elements  of  S'  for  a  variety  of  values  of  M  and  L.  The  principal  purpose  of  this  particular 
simulation  task  is  to  verify  the  computer  algorithm  (and  the  implementing  subprograms) 
employed  to  generate  the  state  space  vectors  for  use  in  the  primary  simulation  task 
reported  in  Section  5.5.  The  data  in  Tables  5-2,  5-3  and  5-4  below  are  products  of  the 
program  GET_SPS.F  appearing  in  Appendix  D.  The  tabulated  results  are  representative 
of  data  produced  for  M  and  L  ranging  up  to  5  and  4  respectively,  for  which  N'  =  15504. 
In  each  case  generated,  the  cardinality  of  the  result  agreed  with  that  predicted  by  Eq.  4. 1 
and  recorded  in  Table  5-1. 

Table  5-2  Table  5-3 

S'atM=2,L=2  S' at  M=3,  L=2 


2  0  0  0 

110  0 

10  10 

10  0  1 

0  2  0  0 

0  110 

0  10  1 

0  0  2  0 

0  0  11 

0  0  0  2 

3  0 

0 

0 

2  1 

0 

0 

2  0 

1 

0 

2  0 

0 

1 

1  2 

0 

0 

1  1 

1 

0 

1  1 

0 

1 

1  0 

2 

0 

1  0 

1 

1 

1  0 

0 

2 

0  3 

0 

0 

0  2 

1 

0 

0  2 

0 

1 

0  1 

2 

0 

0  1 

1 

1 

0  1 

0 

2 

0  0 

3 

0 

0  0 

2 

1 

0  0 

1 

2 

0  0 

0 

3 
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Table  5-4 
S'atM=2,L=3 


2 

0 

0 

0 

0 

0   0    0 

1 

0 

0 

0 

0   0    0 

0 

1 

0 

0 

0    0    0 

0 

0 

1 

0 

0   0   0 

0 

0 

0 

1 

0   0   0 

0 

0 

0 

0 

1    0   0 

0 

0 

0 

0 

0    1    0 

0 

0 

0 

0 

0   0    1 

0 

2 

0 

0 

0 

0   0   0 

0 

1 

0 

0 

0   0   0 

0 

0 

1 

0 

0   0   0 

0 

0 

0 

1 

0   0   0 

0 

0 

0 

0 

1    0   0 

0 

0 

0 

0 

0    1    0 

0 

0 

0 

0 

0   0    1 

0 

0 

2 

0 

0 

0   0   0 

0 

0 

1 

1 

0 

0   0    0 

0 

0 

1 

0 

1 

0   0   0 

0 

0 

1 

0 

0 

1    0   0 

0 

0 

1 

0 

0 

0    1    0 

0 

0 

1 

0 

0 

0   0    1 

0 

0 

0 

2 

0 

0   0   0 

0 

0 

0 

1 

1 

0   0   0 

0 

0 

0 

1 

0 

1    0   0 

0 

0 

0 

1 

0 

0    1    0 

0 

0 

0 

1 

0 

0   0    1 

0 

0 

0 

0 

2 

0   0    0 

0 

0 

0 

0 

1 

1    0    0 

0 

0 

0 

0 

1 

0    1    0 

0 

0 

0 

0 

1 

0   0    1 

0 

0 

0 

0 

0 

2   0    0 

0 

0 

0 

0 

0 

1    1    0 

0 

0 

0 

0 

0 

1    0    1 

0 

0 

0 

0 

0 

0    2    0 

0 

0 

0 

0 

0 

0    1    1 

0 

0 

0 

0 

0 

0   0    2 

5.3  Reward  Function  Data 
This  section  presents  two  sets  of  reward  function  data,  one  for  a  four-bit  optimiza- 
tion problem  and  another  for  a  five-bit  version.  Both  data  sets  are  products  of  the  pro- 
gram GET_R.F  provided  in  Appendix  D.  These  data  sets  are  employed  in  the  simulations 
presented  in  Sections  5.4  and  5.5.  Figure  5-1  presents  the  four-bit  function  and  Figure  5-2 
the  five-bit  version. 
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In  both  data  sets,  the  solution  state  which  maximizes  the  reward  value  is  the  i  e  S 
represented  by  the  decimal  integer  value  12.  That  is,  for  the  four-bit  function,  i<,p,  =  1 100, 
and  its  five-bit  counterpart  is  iop,  =  01 100.  The  reward  function  value  for  the  arbitrary 
i  e  S  is  then  computed  by  assigning  the  value  1  for  each  length  0,  1  or  2  schema  (Section 
3.3.1)  in  agreement  with  the  optimum  bit  pattern  and  summing  the  contributions.  Thus, 
for  example,  for  the  four-bit  reward  function,  the  bit-string  0000  has  function  value  4, 
generated  by  summing  the  contributions  from  the  single  matching  length  0  schema,  two 
matching  length  1  schemata  and  the  one  matching  length  2  schema.  A  strictly  positive 
reward  function  is  guaranteed  since  every  string  matches  the  single  length  0  schema. 


R(i) 


0  1 


3  4  5  6  7  8  9  10         11         12         13        14         15 


Four- Bit  Reward  Function 
Figure  5-1 


48 


R(i) 


12        14        16        18        20        22        24        26        28        30 

i 

Five-Bit  Reward  Function 

Figure  5-2 


5.4  Conditional  Probabilities  Versus  a 
The  following  four  figures  present  plots  of  two  and  three-operator  conditional  prob- 
abilities at  two  selected  current  states,  n.  These  results  are  computed  from  Eq.  4.14  and 
Eq.  4.24.  The  plots  are  generated  for  the  four-bit  problem  with  reward  function  given  in 
Figure  5-1  and  with  M  =  6.  From  Table  5-1,  the  cardinality  of  S'  for  these  examples  is 
N'  =  54264.  The  conditional  probabilities  are  provided  at  two  selected  n  vectors,  one  rep- 
resenting the  uniform  population  n  =  (6000000000000000)  and  one  the  mixed  population 
state  n  =  (2000010001002000),  and  at  three  values  of  the  mutation  probability  parameter. 
The  two  and  three-operator  results  are  respectively  products  of  the  computer  programs 
GET_P2INS  and  GET_P3INS  provided  in  Appendix  D. 
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The  purpose  of  the  tests  from  which  these  data  are  produced  is  verification  of  the 
computer  algorithms  (and  the  implementing  subprograms)  employed  to  generate  the  con- 
ditional probability  calculations  required  by  the  primary  simulation  task  reported  in  Sec- 
tion 5.5.  Thus,  for  example,  all  conditional  probability  distributions  are  uniform  at  a=  1 
as  is  required  by  Eq.  4. 14  and  Eq.  4.24,  and  for  a  -^  0"^  all  conditional  probability 
distributions  approach  the  one-operator  counterparts  as  is  required  by  Eq.  4.17  and  Eq. 
4.27.  Also,  the  two  and  three-operator  conditional  probabilities  are  identical  for  the  uni- 
form population  case  (Figures  5-3  and  5-5)  as  required  by  Eq.  4.18  and  4.28,  and  the 
three-operator  mixed  population  state  case  allows  generation  of  solutions  not  present  in 
the  current  population  even  in  the  zero  mutation  probability  limit. 


P,(i  I  n) 


0.8 


0.6 


0.4 


02 


-1 L \ L- 


1.0 
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0  1  2  3  4  5  6  7  8  9         10        11        12        13        14        15 


P2(i  I  n)  at  n  =  (6(KXKXKXXKXKKXKX)) 
Figure  5-3 
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P2(i  I  n)  at  n  =  (2000010001002000) 
Figure  5-4 
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P3(i  I  n)  at  n  =  (6000000000000000) 


.02 


Figure  5-5 
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P3(i|n)     0, 


8  9         10        11         12        13        14        15 


PjO  I  n)  at  n  =  (2000010001002000) 
Figure  5-6 
5.5  Converged  Limiting  Stationary  Distributions 
The  following  data  represent  converged  three-operator  stationary  distribution 
results  for  both  four  and  five-bit  problems  at  a  variety  of  population  sizes.  The  results 
recorded  in  Figures  5-7  through  5-16  are  products  of  the  computer  program 
GET_3STAT.F  included  in  Appendix  D.  They  are  obtained  by  repeatedly  multiplying  a 
current  state  probability  vector  by  the  three-operator  state  transition  matrix  until  a  termi- 
nation criterion  representing  approximate  convergence  is  attained.  The  starting  probabil- 
ity vector  is  the  multinomial  distribution  corresponding  to  a  uniformly  distributed  PjO  |  n) 
array,  and  the  termination  criterion  is  that  the  sum  of  the  probabilities  for  all  nonuniform 
population  states  is  less  than  0.004. 

All  of  the  results  reported  here  are  for  extremely  small  a  (approaching  zero)  and 
thus,  as  predicted  by  the  model,  only  the  states  corresponding  to  uniform  populations 
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(one-operator  absorbing  states)  have  nonzero  probability.  Consequently,  only  the  final 
probabilities  for  the  uniform  population  states  are  displayed  in  Figures  5-7  through  5-16, 
with  each  such  state  indexed  by  the  decimal  integer  value  corresponding  to  the  solution 
represented. 

Table  5-5  summarizes  the  Cray  Y-MP  computer  resources  expended  in  generating 
these  data.  Tabulated  there  are  the  number  of  vector  multiplications  (of  dimension  N') 
required  to  attain  the  termination  condition  and  the  CPU  time  utilized.  The  CPU  time  is 
in  seconds,  rounded  to  the  nearest  integer.  The  tabulated  data  are  collected  from  the  log 
files  generated  in  the  computer  runs  which  produced  the  stationary  distribution  data  for 
Figures  5-7  through  5-16. 

The  limiting  distribution  entropy  results  in  Figures  5-17  and  5-18  are  computed 
from  the  converged  stationary  distributions.  The  results  are  recorded  in  bits  and  are 
plotted  as  a  function  of  population  size. 

A  very  significant  result  suggested  by  the  limiting  stationary  distribution  data  is  that 
the  a  — >  0"^  value  of  the  stationary  distribution  is  nonzero  for  all  possible  uniform  states. 
This  behavior,  which  is  confirmed  by  theoretical  results  developed  in  Section  7,  pre- 
cludes extrapolation  of  the  simulated  annealing  global  optimality  convergence  result  onto 
the  genetic  algorithm.  However,  as  suggested  by  the  data  plotted  in  Figures  5-17  and 
5-18,  it  may  be  possible  to  approach  the  desired  limiting  behavior  as  closely  as  required 
by  adjusting  the  population  size  parameter.  Those  figures  indicate  that  for  sufficiently 
large  values  of  the  population  size  parameter,  the  limiting  distribution  is  dominated  by 
optimal  solutions,  and  that  the  limiting  distribution  entropy  decreases  monotonically  with 
increasing  population  size.  Results  developed  in  Section  9  reinforce  this  premise. 
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Limiting  Stationary  Distribution  at  M=2,  L=4 
Figure  5-7 
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Limiting  Stationary  Distribution  at  M=3,  L=4 
Figure  5-8 
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Limiting  Stationary  Distribution  at  M=4,  L=4 
Figure  5-9 
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Limiting  Stationary  Distribution  at  M=5,  L=4 
Figure  5-10 


58 


0.5 


0.4 


q(i) 


0.3 


0.2 


0.1 


1 i    1 1          ,1 1.         1  ,         1           1. 

1 

1       1 

0  1  2  3  4  5  6  7  8  9         10        11         12        13        14        15 


Limiting  Stationary  Distribution  at  M=6,  L=4 
Figure  5-11 
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Limiting  Stationary  Distribution  at  M=7,  L=4 
Figure  5-12 
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Limiting  Stationary  Distribution  at  M=2,  L=5 
Figure  5-13 
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Limiting  Stationary  Distribution  at  M=3,  L=5 
Figure  5-14 
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Limiting  Stationary  Distribution  at  M=4,  L=5 
Figure  5-15 
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Figure  5-16 


Table  5-5 
CPU  Utilization  Statistics 


M 

L 

N' 

Iterations 

Seconds 

2 

4 

136 

8 

<1 

3 

4 

816 

14 

8 

4 

4 

3876 

19 

86 

5 

4 

15504 

23 

1219 

6 

4 

54264 

27 

19048 

7 

4 

170554 

30 

219930 

2 

5 

528 

9 

13 

3 

5 

5984 

15 

301 

4 

5 

52360 

21 

11676 

5 

5 

376992 

26 

** 

**  Not  obtained  due  to  unrecoverable  log  file  error 
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Limiting  Distribution  Entropy  vs  Population  Size  (Four-Bit  Problem) 

Figure  5-17 
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M 
Limiting  Distribution  Entropy  vs  Population  Size  (Five-Bit  Problem) 

Figure  5-18 


SECTION  6 
THE  CRAMER'S  RULE  FORMULATION  OF  THE  STATIONARY  DISTRIBUTION 

6.1  Overview 

In  Sections  4.3.2  and  4.3.3,  the  time-homogeneous  two  and  three-operator  simple 
genetic  algorithm  Markov  chains  are  shown  to  possess  unique  stationary  distributions. 
Those  conclusions  are  established  by  invoking  Theorem  A3,  which  asserts  that  in  each 
case  the  stationary  distribution  is  a  left  eigenvector  of  the  state  transition  matrix  and  that 
the  additional  constraint  that  it  be  a  probability  vector  (Definition  A2)  makes  the  solution 
unique.  In  this  section,  the  existence  and  uniqueness  arguments  are  refined  into  a  Cram- 
er's rule  formulation  of  the  solution.  This  development  concerns  the  time-homogeneous 
algorithms  only,  with  a  constrained  to  a  >  0,  and  it  appeals  heavily  to  the  foundation 
provided  in  Appendix  B. 

The  product  of  this  development  is  an  expression  for  the  components  of  the  station- 
ary distribution  vector  as  rational  functions  generated  from  the  characteristic  polynomials 
of  matrices  derived  from  the  state  transition  matrix.  The  derived  matrices  are  generated 
by  setting  selected  rows  of  P  to  zero.  The  utility  of  the  approach  is  that  the  form  of  P 
suggests  a  mechanism  for  expressing  the  values  of  the  characteristic  polynomials.  Some 
key  intermediate  parts  of  the  required  methodology  are  developed  in  Section  9,  but  the 
effort  stops  short  of  explicit  solution.  However,  some  very  significant  conclusions  con- 
cerning the  asymptotic  behavior  of  the  algorithm  are  obtainable  (Sections  7  and  8)  from 
the  results  developed  here  without  explicidy  solving  the  system. 

6.2  The  Stadonary  Distribution  Description 

As  established  in  Section  4,  implementadon  of  the  mutation  operator  with  nonzero 
mutation  probability  (i.e.  a  >  0)  implies  that  for  both  the  two  and  three-operator 
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algorithms,  Vm,n  €  S':P(m  |  n)  >  0-  Thus,  by  Definition  Bl,  p  is  primitive  for  any  integer 
k  >  1.  Hence,  from  Section  B.3,  the  stationary  distribution  of  the  two  and  three-operator 
simple  genetic  algorithm  exists,  is  unique  and  is  a  left  eigenvector  of  the  state  transition 
matrix  corresponding  to  eigenvalue  1,  i.e. 

or  equivalently 

^(P-!)  =  0'.  Eq.6.1 

The  following  proposition  establishes  a  significant  fact  concerning  the  rank  of  the 
matrix  (P  - 1)  in  Eq.  6. 1 . 

Proposition  6. 1 :  The  rank  of  the  matrix  (P-I)  in  Eq.  6.1  is  exactly  N'  -  1  where 
N'  =  card(S')  is  the  dimension  of  P. 

This  result  follows  from  Theorem  B4(f).  Its  significance  is  that  exactly  one  column 
of  the  system  of  equations  in  Eq.  6.1  can  be  replaced  without  sacrificing  any  of  the  con- 
straints which  Eq.  6.1  imposes  on  q„.  Proposition  6.2  below  concerns  such  a  modification 
of  the  system.  The  modification  consists  of  replacing  any  column  (e.g.  the  column 
indexed  by  n  e  S')  of  Eq.  6.1  by  a  column  corresponding  to  the  constraint 

_X  qa(m)  =  qIT=l,  Eq.6.2 

me  S" 

thus  producing  a  system  of  the  form 

^(P-!)5  =  ^  Eq.6.3 

where  (P- I)h  is  generated  by  replacing  the  column  of  (P-I)  indexed  by  n  €  S'  with  the 
vector  1  whose  components  all  have  the  value  1,  and  where  e^  is  the  row  vector  contain- 
ing 1  in  column  n  and  O's  elsewhere. 
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Proposition  6.2:  If  the  constraint  described  in  Eq.  6.2  is  used  to  replace  any  column  (e.g. 
column  n)  of  the  system  in  Eq.  6.1,  the  resulting  system  (Eq.  6.3)  is  full  rank,  or  equiva- 
lently,|(P-!)„|  ;^  0. 

Since  P  is  a  stochastic  matrix  (Definition  A3),  the  system  of  equations  in  Eq  6.1  can 
be  transformed  into  an  equivalent  system  in  which  the  column  indexed  by  the  arbitrary 
column  index  n  e  S'  is  represented  by  the  equation  q^  =  0.  The  required  transformation 
is  obtainable  by  replacing  column  n  by  the  sum  of  all  columns  m  e  S',  and  thus  any 
n  e  S'  is  a  candidate  for  replacement.  Proposition  6.2  is  then  a  restatement  of  Proposition 
B2  in  terms  of  the  determinant  of  the  matrix  of  the  modified  system.  It  is  the  essential 
condition  for  justification  of  the  following  proposition. 

Proposition  6.3:  The  components  of  the  stationary  distribution  can  be  expressed  in  the 
form 

_     |(P-!)-f| 

qa(m)  =  T;^ 


I(p-i)hI 


where  (P-l)^^  is  derived  from  (P-  I)s  by  replacing  the  row  of  (P- 1)5  indexed  by  m  e  S' 
with  the  row  vectored. 


This  result  is  simply  an  application  of  Cramer's  Rule  to  the  solution  of  the  system 
in  Eq.  6.3.  It  applies  because  |  (P-I)^!  9^  0  is  assured  by  Proposition  6.2. 

The  equality  defined  in  Proposition  6.3  can  be  evaluated  without  computing 
I  (P-  I)i;|  directly,  as  suggested  by  the  following  proposition. 

Proposition  6.4:  The  denominator  determinant  in  Proposition  6.3  can  be  written  as 

I(P-i)hI=_I  \{P-\t\. 


me  S' 
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This  result  follows  from  application  of  elementary  column  operations  on  column  n 
of  I  (P  -  I)hI  and  employing  the  definition  of  |  (P  - 1)^  \ .  The  essential  step  is  noting  that 
the  cofactor  of  each  of  the  (unit)  elements  in  column  n  is  equal  to  the  corresponding 

l(p-i^?|. 

Since  the  numerator  determinant  defined  in  Proposition  6.3  is  generated  from 
(P- 1)5  by  replacement  of  row  m  by  the  row  vector  e^,  its  value  is  the  cofactor  of  the 
(unit)  element  in  row  m  and  column  n.  As  indicated  in  the  following  proposition,  it  is 
equal  to  the  determinant  which  results  from  the  corresponding  row  replacement  in  (P  - 1). 


Proposition  6.5:  The  numerator  determinant  defined  in  Proposition  6.3  can  be  written  as 

|(P-!)f|  =  |(P-!)^®'| 

where  (P-I)"""*'  is  defined  as  the  matrix  which  results  from  (P-I)  by  replacing  the  row 
indexed  by  m  with  the  row  vector  e^. 


Next,  note  that  if  m  =  n,  then  |  (P  - 1) "" "  |  can  be  written  as 


i(P_if^)|=_|rp<o)_i 


where  P^  is  defined  as  the  matrix  which  results  by  replacing  row  m  of  P  by  the  row  vec- 
tor 0  .  If  m  5^  n,  then  by  writing  the  replacement  row  in  |  (P  - 1) "" "  |  as 

— T        ,— T       — T    ,   — Ti         r— T       — T        /    — Tvi 

es  =  les  -  Ch;  +  e^]  =  [65  -  eg  -  (-ej], 

I  (P-I)™"  I  can  be  written  as  the  difference  of  two  determinants  derived  from  (P-I),  one 
with  the  m*  row  replaced  by  [e^-e^]  and  the  second  with  the  m*  row  replaced  by  f-e^l- 
The  -e^  term  in  each  row  replacement  provides  the  necessary  principal  diagonal  contri- 
bution to  permit  expression  of  |  (P  - 1) "" "  |  as 


:(P-j)'*"|  =  |(if-i)|- 


Ps'-i 
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where  pC)  is  defined  as  before  and  where  p®  is  defined  as  the  matrix  which  results  by 
replacing  row  m  of  P  by  the  row  vector  e^. 

This  result  can  be  further  reduced  by  noting  that  the  row  replacement  by  which  P^ 
is  generated  from  P  preserves  the  row  sum  constraint  (i.e.  P^  is  a  stochastic  matrix). 
Thus,  1  is  an  eigenvalue  of  P®  (Definition  A3),  from  which  it  follows  that  |  (P;^  - 1)|  -  0. 
Consequently,  the  m  =  n  and  m  ?t  n  cases  can  be  assembled  as  indicated  by  the  following 
proposition. 


Proposition  6.6:  The  determinant  |  (P  - 1) " "  *|  defined  in  Proposition  6.5  can  be  written 
as 

\iP-if''\=-M-i]\. 


By  collecting  the  results  of  Propositions  6.3-6  and  noting  that  the  superscript  in  P^; 
is  now  superfluous,  the  components  of  the  stationary  distribution  can  be  written  as  indi- 
cated in  the  following  proposition. 

Proposition  6.7:  The  components  of  the  stationary  distribution  can  be  expressed  in  the 
form 

_      -\P^-i\        |Ps-!| 


_Z-|P5-i|    _I|Ph-i| 

n€  S'  ne  S' 

where  P^  and  P^  are  derived  from  P  by  replacing  the  rows  indexed  by  m  and  n  respec- 
tively with  the  row  vector  0  . 

Thus,  computing  the  stationary  distribution  components  reduces  to  evaluating  the 
characteristic  polynomials  of  the  Psi's  at  A.  =  1  (i.e.  (t>s(^)  =  I  P^  -  ^I|  =>|Ph-I|  =<t>s(l))- 
Also,  since  1  is  an  eigenvalue  of  P  it  follows  that  (})(1)  =  |  P-I|  =0,  which  suggests  the 
following  alternative  to  Proposition  6.7.  Its  usefulness  is  established  in  Sections  9.3-9.4. 
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Proposition  6.8:  The  components  of  the  stationary  distribution  can  be  expressed  in  the 
alternative  form 


_  IP-II-IP5-II 

qu(m)  =  ■ 


I(|P-I|-|P5-I|) 
n€  S' 


where  as  before  P^  and  P^  are  derived  from  P  by  replacing  the  rows  indexed  by  m  and  n 
respectively  with  the  row  vector  0  . 

6.3  Positivity  of  the  Stationary  Distribution  Components 
Strict  positivity  of  the  stationary  distribution  components  can  be  deduced  from 
Theorem  B4  and  the  form  of  P^.  Every  element  of  P^;  in  every  row  other  than  row  m  is 
identical  to  the  corresponding  element  of  P,  while  those  in  row  m  are  zero.  This  is 
expressed  in  the  nonnegative  matrix  notation  of  Appendix  B  as  0  <  P^  ^  P.  Further,  since 
P5;  and  P  differ  in  row  m,  P^  ^  P,  and  consequently  by  Theorem  B4(e),  every  eigenvalue 
of  Ps  satisfies  |  ^.J  <  1 .  It  follows  that  for  >.  >  1 ,  ( 1 )  |  P^  -  >.I|  =  U{\  -  A.)  ^t  0  and  (2)  the 
algebraic  sign  of  |  Ps-  ^I|  is  (-l)"^  for  all  m  e  S'.  Specializing  these  arguments  to  the 
case  A,  =  1  yields  the  following  proposition. 

Proposition  6.9:  For  all  a  >  0,  the  value  of  the  determinant  |  P^  - 1|  satisfies 
ViHe  S':       (1)  IP^-Tl  ^^Oand 

(2)  the  algebraic  sign  of  I  Phj-I|  is  (-if. 

An  immediate  consequence  of  Proposition  6.9  is  that  both  numerator  and  denomi- 
nator of  the  expression  for  q„(m)  in  Proposition  6.7  are  nonzero  and  have  identical  alge- 
braic sign.  Strict  positivity  of  the  stationary  distribution  components  follows  from  these 
observations.  That  is,  Vm  e  S' :  q,,(m)  >  0. 
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6.4  The  Indeterminate  Form  at  a  =  0 

All  of  the  results  established  in  this  section  assume  that  the  mutation  probability 
parameter  is  strictly  positive  (a  >  0),  and  thus  are  not  applicable  at  a  =  0.  The  reason  is 
apparent  when  Eq.  4.7  and  the  two-operator  result  in  4.17  (or  the  three-operator  counter- 
parts of  Eq.  4. 17  given  by  Eq.  4.23  and  4.27)  are  applied  to  |  P^  - 1| .  It  follows  that  the 
row  of  the  a  -^  0^  limit  of  |  Ps-I|  corresponding  to  the  one-operator  absorbing  state 
n^  e  Sa',  n^  ?<!:  m  is  zero.  That  is,  the  only  nonzero  entry  in  row  n^  of  the  a  -^  0""  limit  of 
Pg  is  the  principal  diagonal  element 

lim  P^Cn^  I  nj  =  lim  PjCn^  |  n^)  =  PiK  I  "a)  =  1 . 

a  -» 0*  a  -+  0* 

which  is  cancelled  by  the  corresponding  principal  diagonal  element  in  -I.  Thus, 

Vme  S':   lim|P--!|  =0, 

a-»0* 

and  consequently  Propositions  6.7  and  6.8  yield  indeterminate  forms.  However,  as  dem- 
onstrated in  the  following  section,  it  is  possible  to  verify  that  a  limiting  stationary  distri- 
bution vector  exists  for  the  time-homogeneous  two  and  three-operator  algorithms. 


SECTION  7 
THE  ZERO  MUTATION  PROBABILITY  STATIONARY  DISTRIBUTION  LIMIT 

7.1  Overview 

In  Section  4.3.1,  it  is  established  that  the  time-homogeneous  one-operator  genetic 
algorithm  Markov  chain  possesses  a  stationary  distribution  but  that  it  is  not  unique.  In 
Sections  4.3.2  and  4.3.3,  it  is  established  that  the  time-homogeneous  two  and  three- 
operator  counterparts  possess  unique  stationary  distributions  provided  a  >  0,  and  Section 
6  formulates  the  existence  and  uniqueness  argument  into  a  rational  function  expression 
for  the  unique  solution.  Since  the  two-operator  state  transition  matrix  approaches  its  one- 
operator  counterpart  as  a  ^  0*  (Eq.  4.17)  and  since  the  three-operator  algorithm  exhibits 
the  corresponding  behavior  with  respect  to  the  P2'(i  I  n)s  (Eq.  4.23),  a  question  which 
naturally  arises  from  these  observations  is  whether  an  a  — >  0^  limiting  distribution  exists 
for  the  two  and  three-operator  algorithms.  (If  such  a  limit  exists,  then  it  is  necessarily 
unique).  This  section  answers  that  question  affirmatively  and  also  confirms  the  observa- 
tion made  in  Section  5.5  that  the  limiting  distribution  is  nonzero  for  all  states  correspond- 
ing to  uniform  populations  (absorbing  states). 

The  approach  taken  here  is  to  transform  the  expressions  for  q„(m)  in  Propositions 
6.7  and  6.8  into  equivalent  expressions  which  yield  determinate  forms  at  a  =  0.  The  result 
requires  transforming  P  and  P^  into  related  matrices  but  with  the  states  corresponding  to 
uniform  populations  (one-operator  absorbing  states)  coalesced  into  adjacent  nonunifonn 
population  states.  The  development  is  tedious  and  involves  some  additional  notation. 
7.2  Functional  Form  of  the  Stationary  Distribution 

Before  proceeding  with  the  limiting  case  development  which  is  the  primary  purpose 
of  this  section,  it  is  convenient  to  establish  some  intermediate  results  concerning  the 
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behavior  of  q^  as  a  function  of  a-  These  resuUs  follow  from  the  results  developed  in  Sec- 
tion 6  and  some  simple  observations  about  the  form  of  the  elements  of  P. 

From  Eq.  4.14-4.16  and  Eq.  4.22-26,  all  elements  of  the  state  transition  matrix  are 
rational  functions  of  a  with  denominator  polynomial  (1  +  a)^^.  Thus,  for  a  >  0 

(l-^a)^'|Ps-!|  =\{l+af'-P--{l+a)'^l\ 

=  |Qs-(l+ar!| 

where  every  element  of  Q^  (and  hence  the  value  of  |  Q5  -  (1  -I-  a)*^^!! )  is  a  polynomial  in 
a.  Further,  since  row  m  in  Qs  is  zero,  the  polynomial  value  of  the  determinant  includes 
the  factor  (1  -1-  a)'^''.  Consequently 

(1-Ha)^'-'^|P--I|=e,,(a)  Eq.7.1 

for  0^(a)  some  polynomial  function  of  a.  Proposition  7.1  below  follows. 

Proposition  7.1:  For  all  a  >  0,  the  value  of  the  determinant  |  P^  - 1|  is  a  rational  function 
of  a  with  nonzero  denominator  polynomial  (1  +  a)"^"^  "''. 

By  applying  Eq.  7.1  to  Proposition  6.7,  the  components  of  q„  can  be  written  as 

_        es(a)       es(a) 

ne  S' 

Hence,  the  qa(m)  are  rational  functions  of  a,  and  since  a  rational  function  is  continuous 
everywhere  its  denominator  polynomial  is  nonzero,  application  of  Proposition  6.9  and 
Eq.  7.1  (which  together  establish  that  0(a)  =  Z6H(a)  ^  0)  to  Eq.  7.2  yields  the  following. 

Proposition  7.2:  For  all  a  >  0,  the  components  of  q^  are  continuous  rational  functions  of 
the  independent  variable  a. 

Further,  differentiation  of  Eq.  7.2  with  respect  to  alpha  yields  a  rational  function  of  a 
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dqa(m)  _      1 
da    ~  0(a)= 


e(a)^-e,(a,''«<«' 


da       '•"'"'    da 

with  nonzero  denominator  polynomial  0(a)^  The  following  proposition  is  a  conse- 
quence. 


Eq.  7.3 


Proposition  7.3:  For  all  a  >  0,  the  components  of  the  first  derivative  of  q„  with  respect  to 
a  are  continuous  rational  functions  of  a. 


7.3  The  Absorbing  State  Rows  of  I  P  -  II  and  I  Pzr  -  II 

The  rows  corresponding  to  one-operator  absorbing  states  in  the  determinant 
I  P- 1|  have  a  particularly  simple  form.  The  nondiagonal  elements  of  row  n^  e  S^',  which 
represents  a  uniform  population  of  solutions  i^  e  S,  are  given  by  Eq.  4.19  and  4.29 
respectively  for  the  two  and  three -operator  cases.  The  principal  diagonal  element  is 
obtained  by  evaluating  Eq.  4.19  or  4.29  at  m  =  n^  and  subtracting  1.  Thus, 


P(nJnJ-l  = 


/"M^  „^'(v.'a) 


a 


v"Ay 


(l-t-a)^ 
-1 


-1 


1  -  1  -  MLa-  ML(ML  -  l)a^/2 


-a 


ML 


(1+a) 


ML 


-MLa-HO(aO 


(l-f-a)""^     ' 

and  if  the  general  element  in  |  P- 1|  is  denoted  by  T(m  |  n),  then  the  elements  of  row  n^ 
can  be  written  as 


Vn^eS,':T(n|nJ  = 


-MLa-(-0(a') 


\ML 


(1-1- ar 

Mia 


n  =  n. 


Eq.  7.4 


n  j(H-a) 


ML 


neS'-{n^} 
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Additional  insight  into  the  form  of  the  absorbing  state  rows  can  be  obtained  with 
the  aid  of  the  following  notation.  Let  m^.n^  e  Sa'  be  distinct  but  otherwise  arbitrary 
absorbing  states  of  the  one-operator  Markov  chain,  let  i^  e  S  be  the  bit-string  represented 
in  Ha  and  let  8(1^)  c  S  be  the  set  of  bit-strings  accessible  from  Ia  via  exactly  one  bit  muta- 
tion event  (i.e.  8(1^)  =  {ipii  e  S,H(i,,iA)  =  !})•  It  follows  from  this  definition  that 
card(S(iA))  =  L.  Then,  for  M  >  1  let  8(11^)'  be  defined  as 

S(HA)'  =  {H:He  S',n(iA)  =  M- l,n(i,)=  l,i,  €  SdJ}  cS', 

the  set  of  nonabsorbing  states  adjacent  to  the  absorbing  state  n^.  The  restriction  on  M  is 
required  to  ensure  that  no  absorbing  state  m^  is  contained  in  the  adjacency  set  of  any 
absorbing  state  n^.  S(nA)'  includes  exactly  one  distinct  element  for  each  i,  €  5(1^),  and 
consequently  card(S(nA)')  =  cardCSCi^))  =  L.  Also,  from  the  form  of  8(11^)',  it  follows  that 
for  M  >  3,  8(mA)'  and  8(nA)'  are  disjoint  if  m^  and  n^  are  distinct  one-operator  absorbing 
states.  Thus,  if  8^"  is  defined  as 

8a"  =    u    S{nJ 

and  M  >  3,  then  card(8A")  =  card(8A')  x  L  =  NL.  This  restriction  on  M  is  assumed  in  all  of 
the  following. 

With  the  aid  of  the  new  notation,  the  element  in  column  n  g  8(nA)'  of  row  nA  in 
I P- 1|  can  be  written  as 

f    M    ^ 


Vne8(nA)':T(n|nA)  =  P(n|nA)  = 


a 


(,M-lj(l+a)^ 

M^       g 
1  J(l+af^ 

Ma 


(l-i-af'- 
Thus,  Eq.  7.4  can  be  revised  as  follows 


VH^eS/:T(n|nJ  =  - 
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-MLa  +  0(a^) 


(1+a) 
Ma 


n  =  n. 


MI.  A 


(l+a)""' 
O(a0 


n  €  S(nJ'  Eq.  7.5 

H€S'-S(HJ'-K} 


(1+ar 

where  the  exponent  s  of  a  in  the  order  expression  for  the  general  term  is  an  integer  satis- 
fying s  >  2.  The  elements  in  columns  n^  and  n  €  SCnJ'  are  first  order  in  a  while  the  ele- 
ments in  all  other  columns  are  at  least  second  order. 

Eq.  7.5  applies  to  every  absorbing  state  row  of  |  Pis- 1|  as  well  if  in  g  S^'.  If 
|P5^~  1 1  is  being  considered  where  m^  e  S^',  then  row  m^  contains  -1  at  its  principal 
diagonal  and  zeros  elsewhere.  In  that  case  Eq.  7.5  only  applies  to  the  absorbing  state  rows 
n^  s  S^'  -{m^}.  Exacdy  N-  1  such  rows  exist  in  |Pm^-l|. 

By  applying  Eq.  7.5  and  these  observations  to  Proposition  7.1,  it  follows  that  the 
lowest  order  term  with  nonzero  coefficient  which  can  conceivably  exist  in  the  numerator 
polynomial  of  |Pis^- 1|  is  the  order  a^"'  term.  Similar  reasoning  reveals  that  the  corre- 
sponding lowest  order  term  with  nonzero  coefficient  for  |  P^- 1|  with  m  €  S^'  is  the  order 
a'^  term.  If  the  coefficient  of  the  order  a^" '  term  in  the  numerator  polynomial  of  |  P^^  - 1 1 
is  indeed  nonzero,  and  if  the  corresponding  coefficients  for  all  such  m^  have  the  same 
algebraic  sign,  then  the  required  limiting  value  of  q„  can  be  expressed  in  terms  of  the.se 
nonzero  coefficients  via  substitution  into  Proposition  6.7.  These  conditions  are  in  fact 
satisfied  as  demonstrated  below. 

7.4  Reformulation  of  Propositions  6.7  and  6.8 

The  next  step  in  this  development  is  the  definition  of  some  auxiliary  matrices 
related  to  P  and  P^;  and  the  reformulation  of  Propositions  6.7  and  6.8  in  terms  of  them. 
The  new  matrices,  designated  PCm^)'  and  P^^  '  respectively,  are  derived  by  coalescing 
each  of  the  N  -  1  absorbing  state  columns  n^  e  S^'  -  { m^ }  of  |  P  - 1|  and  |  Ps^  -  1 1  with  its 
neighboring  nonabsorbing  state  columns,  n  €  S(nA)'.  Specifically,  let  |Qs^|  be  derived 
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from  I  p 1 1  by  adding  l/L  times  the  column  n^  €  Sa'  -  {m^}  to  each  of  the  L  adjacent 

I  A  I 

nonabsorbing  columns  n  €  S(n^y  and  repeating  the  process  for  each  remaining 

n^  e  Sa'  -  {mA}.  This  operation  is  applied  once  each  for  the  exactly  N  -  1  absorbing  state 

columns  Ha  e  Sa'  -  {hia}  and  it  preserves  the  value  of  the  determinant  |  Qs^  |  =  1^*5;^  ~^  |- 

If  now  Qs  (m  I  n)  denotes  the  general  element  of  |  Qs^  |,  then  by  applying  the  recipe 
used  in  its  construction  and  Eq.  7.5,  the  elements  in  the  absorbing  state  rows 
nA  e  Sa'  -  {mAJ  of  Qs    can  be  written  as 


VitIa  e  Sa',  VnA  e  Sa'  -  {mA}  :  QsXm  I  n^) 


-MLa  +  0(a^) 


m  =  n. 


(l+ar^-  ^ 

O(a') 


meSK)' 
^^         m€S'-S(HA)'-{HA} 


(l+af^ 
O(a^) 


(1+ar 

where  as  before  s  is  an  integer  satisfying  s  >  2.  Thus,  each  of  the  N  -  1  absorbing  state 
rows  Ha  e  Sa'  -  {vri/^}  of  |  Qs^  |  can  be  written  as  a  sum  of  two  rows,  one  row  containing 
-MLot/(l  +  a)"^^  at  its  principal  diagonal  location  and  zeros  elsewhere  and  the  second  row 
being  a  multiple  of  a^/(l  +  a)"^^.  It  follows  from  elementary  determinant  row  expansion 
operations  that  |  Ps;^  ~  ^  |  =  lOs^  |  can  be  written  as 

I       _|     II    (-MLaf-|Q,;|        o(a-) 

where  |  Qh^'  |  is  the  order  N'  -  N  +  1  principal  minor  of  |  Qs^  |  generated  by  deleting  the 
N  -  1  row/column  pairs  which  intersect  on  the  nA  €  Sa'  -  {mAJ  principal  diagonals  and 
where  the  exponent  s  of  a  in  the  Eq.  7.6  order  expression  is  an  integer  satisfying  s  >  N. 
The  elements  in  all  rows  of  |  Qh^'  |  except  row  mA  are  composed  of  contributions 
from  the  elements  in  nonabsorbing  state  rows  of  P  and  the  -1  principal  diagonal  term 
contributed  by  I  in  |  P^^  - 1 1.  Row  mA  of  |  Qs^'  |  contains  -1  at  its  principal  diagonal  loca- 
tion and  zeros  elsewhere.  Thus,  if  Qs. '  is  written  as  Qs. '  =  Ps. '  "  I'  ,  then  firom  the 
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recipe  employed  in  its  construction,  it  follows  that  the  square  matrix  p_  '  thus  defined  has 
dimension  N'  -  N  +  1  and  that  its  elements  are  given  by 

Eq.  7.7 


Vm,n  e 
S'-S/  +  {H,} 


Ps.(m|n)'  = 


0 


P(m  I  n) 


n  =  m. 


n?tm. 


P(m|n)  +  ^P(n^|n) 


m€  S'-Sa'  + 
{mJ  +  S(mJ'-S/'. 

n^tm^ 
me  Sin  J 


Careful  examination  of  Eq.  7.7  reveals  that  the  transformation  by  which  P^;  '  is  gen- 
erated from  P^^  preserves  all  row  sums.  Thus,  Pj^^'  is  very  similar  in  form  to  P^;  .  It  is 
derived  from  a  (fictitious)  row  stochastic  matrix  by  setting  a  specified  row  (row  m^)  to 
zero. 

If  the  preceding  steps  are  repeated  for  |  P^  - 1|  where  m  ^  S^',  except  that  all  N 
absorbing  state  columns  n^  e  S^'  are  coalesced  rather  than  just  the  N  -  1  columns 
n^^  e  Sa'  -  {m^},  a  result  very  similar  in  form  to  Eq.  7.6  obtains.  That  is, 


-     -       _       (-MLaflQs'l         O(a') 


Eq.  7.8 


(1+ar-^        (1+a)^'^' 
where  |  Q^i'l  is  the  order  N'  -  N  principal  minor  of  |  Q^\  generated  by  deleting  the  N 
absorbing  state  row/column  pairs  and  s  is  an  integer  satisfying  s  >  N  +  1.  The  nonabsorb- 
ing  state  row  m  contains  -1  at  its  principal  diagonal  k^ation  and  zeros  elsewhere. 

Substitution  of  Eq.  7.6  and  7.8  into  Proposition  6.7  yields  a  form  more  amenable  to 
examination  of  the  a  -^  0*  limiting  stationary  distribution.  The  two  cases  m  e  S^'  and 
me  S'  -  Sa'  must  be  distinguished.  Then,  after  some  straightforward  algebra. 
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{     |Qi,;|  +  0(a)/(l+af-'^-' 
S   |QH;|  +  0(a)/(l+af-^^' 


qa(m)  = 


m  =  m,  e  S.' 


0(a)/(l+ay 


N'-N  +  l 


X    Qh'  +0(a)/(l+a) 


—      meS'-S' 

N'-N+l  ^ 


An  equivalent  result  expressed  in  terms  of  the  auxiliary  matrices  P^;  '  is 


qa(m)  =  ' 


0(a)/(l+af  "^"' 
Z   |PH;-I'|  +  0(a)/(l+ar-^^' 


m  =  mA  e  S, 


me  S'-S/ 


Eq.  7.9 


By  retracing  the  preceding  steps  by  which  P^  was  transformed  into  P^  ',  compan- 
ion results  to  Eq.  7.7  and  Eq.  7.9  can  be  developed  for  PCm^)'.  The  companion  to  Eq.  7.7 
differs  only  in  the  elements  of  row  n  =  m^.  Thus,  if  P(m  |  n)'  denotes  the  general  element 
in  PCm^)',  then 

Eq.  7.10 


Vm,ne  _  _ 


P(m  I  n) 


t-  - 


P(m|n)  +  -P(njn) 


me  S'-S^' 
+  {mJ  +  S(mJ'-S/' 


m  e  S(nJ' 


Further,  examination  of  Eq.  7.10  reveals  that  the  row  sum  constraint  on  P  is  preserved  in 
the  transformation  by  which  PCm^)'  is  generated  (i.e.  PCm^)'  is  a  stochastic  matrix).  Thus, 
I  P(m;^)'  -I'l  =  0.  A  consequence  is  the  Proposition  6.8  counterpart  of  Eq.  7.9, 

Eq.7.11 

|P(SJ'-I'|  -|Ps ;-r|  +  0(a)/(l+af -^^' 


qa(m) 


I  i|P(nJ'-r|  -|PH;-r||  +o(a)/(i+ar-^^' 

0(a)/(l  +af-^*^ ■ 

■vN'-N+l 


I     |P(nJ'-I'|-  Ps'-r     +0(a)/(l+a) 


m  =  m.  6  Sa' 


me  S'-S  ' 
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7.5  The  Stationary  Distribution  Limit 
The  zero  mutation  probability  limits  of  Eq.  7.9  and  Eq.  7. 1 1  exist  if  the  determinant 
sums  in  the  denominators  are  nonzero.  In  fact  they  are  nonzero,  as  demonstrated  in  the 
following.  This  argument  is  very  similar  in  form  to  the  development  in  Section  6.3  con- 
cerning positivity  of  the  stationary  distribution.  The  essential  step  is  demonstration  of  the 
existence  of  a  primitive  stochastic  matrix  Q'  which  satisfies  both 

0  <  lim  P-^'  <Q'  and  Q'  ^  lim  P-^'. 

a-»0*        *  a-»0*        * 

If  the  two-operator  algorithm  is  under  consideration,  then  the  elements  of  the 
a  — >  0*  limit  of  P^^  '  are  obtained  by  substituting  the  one-operator  results  in  Eq.  4.2-5  into 
Eq.  7.7.  If  the  three-operator  case  is  under  consideration,  then  Eq.  4.22  and  Eq.  4.24,25 
are  employed.  In  the  following,  the  two-operator  notation  is  employed. 

Let  Q'  be  generated  from  the  a  — >  0*  limit  of  P^  '  by  replacing  row  m^  with  the  row 
whose  elements  are  given  by 

VEgS'-S/  +  {m^}:Q'(m|Ej  =  ^,_|^^^>Q.  Eq.7.12 

Thus,  the  row  sum  of  row  m^  in  Q'  is  1.  Since  all  remaining  rows  of  Q'  are  identical  to 
those  of  the  a  — >  C  limit  of  P^  ',  and  consequently  have  row  sum  1  by  Eq.  7.7,  Q'  is  a 
stochastic  matrix.  Additionally,  it  satisfies  both 

0<  lim  P-^'  <Q'  andQ'  ^  lim  P-^'. 

Q'  can  be  regarded  as  the  state  transition  matrix  of  a  fictitious  Markov  chain 
defined  on  the  state  space  S'  -  S^'  +  {rn^}.  Since  QCm^  |  itIa)  >  0,  the  fictitious  Markov 
chain  is  both  aperiodic  (Definition  A9,  Theorem  A2)  and  primitive  (Definition  Bl,  Theo- 
rem Bl)  provided  that  it  is  irreducible  (Definitions  A7  and  A8,  Theorem  Al).  Thus, 
primitivity  is  established  by  demonstrating  that  every  state  me  S'  -  S^^'  -I-  {m^}  is 
accessible  in  some  finite  number  of  transitions  from  every  state  n  €  S'  -  S^'  -i-  {ni,^}. 
Since  all  states  in  S'-SA'  +  {niA}  are  accessible  in  one  transition  from  m^  (Eq.  7.12),  it  is 
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sufficient  to  demonstrate  that  m^  is  accessible  in  some  finite  number  of  transitions  from 
every  state  n  e  S' -  Sa'. 

Let  iA  e  S  be  the  bit-string  represented  in  m^,  let  n  e  S'  -  S^'  and  let  i,  e  S  be 
selected  such  that  n(i,)  >  0  and  H(i,,  i^)  <  H(i,  i^)  for  all  i  represented  in  n.  Then,  two 
cases  must  be  examined.  In  case  (1),  i,  =  i^  while  i,  ^  i^  for  case  (2). 

If  ij  =  Ia,  it  follows  from  Eq.  7.7  and  the  construction  of  Q'  that 
Q'K  I  n)  =  lim  P-^(Ha  I  n)  =  lim  P^CHa  |  n) 


— \M  , 


^riimP^CiAln)]" 

=  P,(iA|nf  =  P,(i,|Hr>0 

and  consequently  mA  is  accessible  from  n  in  1  transition.  Otherwise 

3i2€S(i,)3H(i2,iA)  =  H(i„iA)-l 

and  further  if  n,  e  Sa'  is  the  one-operator  absorbing  state  defined  by  the  condition 
ni(ii)  =  M  while  n,2  €  S(n,)'  is  the  adjacent  nonabsorbing  state  defined  by 
n^Oi)  =  M-  1,  ni2(i2)  =  1,  then  from  Eq.  7.7  and  the  construction  of  Q' 

Q'(ni2 1  n)  =  lim  P2(Hi2 1  n)  +  -  lim  PjCH,  |  H) 
=  P,(n,j|n)+-P,(i,|H) 


L 


=  rP.(n,|n) 


L" 


=  f[P,(i,|n)f  >0. 


Thus,  ni2  is  accessible  from  n  in  one  transition.  If  ij  =  Ia,  then  by  the  case  (1)  argument  iba 
is  accessible  in  one  additional  transition.  Otherwise,  the  case  (2)  argument  is  repeated  for 
some 

i3  E  S(i2)  3  H(i3,  Ia)  =  H(i2,  Ia)  -  1  =  H(i.,  Ia)  -  2. 
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This  procedure  necessarily  terminates  with  H(i,,iA)  +  1  applications  and  the  correspond- 
ing state  space  trajectory  is  executed  with  nonzero  probability. 

From  the  foregoing  argument,  it  follows  that  state  m^  is  accessible  in  some  finite 
number  of  transitions  from  every  state  n  e  S'  -  S^',  and  thus  that  Q'  is  primitive.  Then, 
since  both 

0<  lim  P-^'  <Q'  andQ'  it  lim  P-^', 

it  follows  from  Theorem  B4(e)  that  every  eigenvalue  of  the  a  -^  0"^  limit  of  P^  '  satisfies 
\X,\  <  1.  Proposition  7.4  below,  which  is  the  a  — >  0*  counterpart  of  Proposition  6.9,  is  a 
consequence. 


Proposition  7.4:  The  value  of  the  determinant  lim  [  P^^'  - 1'  |  satisfies 


Vm^E  S^':    (1)     lim  p    '-I'Uo 

I  A  I 

a-»0* 


(2)    the  algebraic  sign  of  lim  |  P^jJ  ~  I'  |  is  (-1) 


'  _  T'  I  :„  /     i\N'-N  +  l 

'A 


The  conditions  asserted  in  Proposition  7.4  ensure  that  substitution  into  Eq.  7.9  and 
7.1 1  yields  a  determinate  form  in  the  a  — >  iT  limit.  Propositions  7.5  and  7.6  below  repre- 
sent the  limiting  forms,  and  consequently  are  respectively  the  limiting  counterparts  of 
Propositions  6.7  and  6.8. 


Proposition  7.5:  The  components  of  lim  q„  exist  and  can  be  expressed  in  the  form 

a-»0' 


lim  I  p^;- 1' I 

a-»0*  —       —  „    , 

m  =  m^  e  S^ 


lim  q„(m)  =  ^ 

a-»0* 


_1   iim|P5;-r 


0  m€S'-S^' 
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Proposition  7.6:  The  components  of  lim  q„  exist  and  can  be  expressed  alternatively  as 

a-»0* 


lim{|P(mJ'-r|-|PE;-i' 


liin  q„(m)  =  qo(m)  = 


a-»0* 


_Z     lim||P(nJ'-r|-|P5;-I 


m  =  m^e  S^' 


0  meS'-S^' 


An  immediate  consequence  of  Propositions  7.4  and  7.5  is  strict  positivity  of  the 
zero  mutation  probability  limiting  stationary  distribution  components  for  all  absorbing 
state  rows.  That  is,  Vm^  e  S^'  :  qo(mA)  >  0-  The  argument  is  analogous  to  that  at  the 
conclusion  of  Section  6.3  concerning  strict  positivity  of  all  stationary  distribution  compo- 
nents when  a  >  0.  This  result  is  anticipated  by  the  simulation  results  in  Section  5.5.  A 
consequence  is  that  the  required  limiting  behavior  for  direct  application  of  the  simulated 
annealing  convergence  theory  to  the  genetic  algorithm  model  does  not  follow.  However, 
the  results  displayed  in  Section  5.5  and  developments  produced  in  Section  9.3  suggest 
that  the  limiting  distribution  can  be  made  arbitrarily  close  to  the  desired  limiting  behav- 
ior. 

Since  the  a  — >  0^  limit  of  the  stationary  distribution  exists,  the  definition  of  q„  can 
be  extended  to  include  the  point  a  =  0.  That  is 

qJa=o  =  qo=  lim  Qa 

a-»0* 

where  the  values  of  the  required  limits  are  provided  by  Proposition  7.5.  Proposition  7.7 
below  follows  from  this  extended  definition  of  q^  and  Proposition  7.2. 

Proposition  7.7:  For  all  a  >  0,  the  components  of  q^  are  continuous  rational  functions  of 
the  independent  variable  a. 
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Proposition  7.3,  which  concerns  the  first  derivative  of  q^,  can  also  be  extended  to 
include  the  limiting  case.  The  extension  requires  easily  obtainable  counterparts  of  Eq. 
7.1-3  developed  for  |  Ph^'  - 1'  |  and  Eq.  7.9.  The  Eq.  7.1  counterpart  is 

(l+a)^'-^|PH;-i'l  =  es,(a)',  Eq.7.13 

and  that  for  Eq.  7.2  is 

Id^^iay  +  Oia) 

7-—- — ;r— —     ni  =  m.  gS.' 

0(a)  +0(a)  ^^^j^ 


qa(m)  ■ 


Q(«)  ;;:.c._c' 

e(a)'  +  0(a)  ^^ 


where  0(a)'  is  the  polynomial  counterpart  (summed  over  n^  e  S^')  of  0(a)  in  Eq.  7.2. 
Differentiating  Eq.  7.14  with  respect  to  a  yields  a  rational  function  with  denominator 
polynomial  [0(a)'  +  0(a)]^  whose  a  -^  0"^  limit  is  nonzero  by  Proposition  7.4,  Eq.  7.13 
and  the  definition  of  0(a)'.  Proposition  7.8  below  follows  from  Proposition  7.3  and  these 
observations. 

Proposition  7.8:  The  components  of  the  first  derivative  of  q„  with  respect  to  a  possess 
limits  as  a  ^  0^ 

Thus,  a  zero  mutation  probability  limit  exists  for  the  time-homogeneous  two  and 
three-operator  algorithm  variants.  The  limit  is  represented  by  Propositions  7.5  and  7.6. 
Further,  Propositions  7.7  and  7.8  establish  some  useful  ancillary  results  concerning  the 
stationary  distribution  behavior  at  the  point  a  =  0.  These  latter  results  are  employed  in  the 
following  section  in  establishing  strong  ergodicity  of  the  inhomogeneous  genetic  algo- 
rithm Markov  chain.  Propositions  7.5  and  7.6  are  used  in  Section  9  to  develop  a  mcthcxi- 
ology  for  representing  the  stationary  distribution  limit. 


SECTION  8 
A  MONOTONIC  MUTATION  PROBABILITY  ERGODICITY  BOUND 

8.1  Overview 

The  annealing  schedule  bounds  for  the  simulated  annealing  algorithm,  which  are 
reviewed  in  Section  2.4.2,  are  derived  by  requiring  that  the  nonstationary  Markov  chain 
which  represents  the  algorithm  be  strongly  ergodic  (Definition  A 13)  and  then  deducing  a 
monotonic  lower  bound  on  the  algorithm  control  parameter.  The  methodology  consists  of 
demonstrating  that  the  time-homogeneous  Markov  chain  corresponding  to  every  positive 
algorithm  control  parameter  value  possesses  a  stationary  distribution,  that  the  sequence  of 
stationary  distributions  corresponding  to  any  sequence  of  positive  control  parameter  val- 
ues converges  to  a  limiting  distribution  if  the  control  parameter  sequence  converges  to 
zero,  and  then  employing  Definitions  Al  1-A13  and  Theorems  A5-A7  to  deduce  a 
sufficient  condition  (the  annealing  schedule  lower  bound)  to  guarantee  that  the  nonsta- 
tionary algorithm  achieves  the  limiting  distribution  (i.e.  strong  ergodicity). 

The  model  development  in  Section  4  demonstrates  that  for  all  mutation  probability 
values  in  the  range  0  <  p^  <  1 ,  the  Markov  chain  representing  either  the  two  or  three- 
operator  time-homogeneous  simple  genetic  algorithm  possesses  a  stationary  distribution. 
Section  7  demonstrates  that  the  stationary  distribution  approaches  a  limit  as  the  mutation 
probability  parameter  approaches  zero.  This  section  proposes  and  then  verifies  a  mono- 
tone decreasing  lower  bound  on  the  mutation  probability  sequence  of  the  nonstationary 
genetic  algorithm  Markov  chain  which  is  sufficient  to  ensure  strong  ergodicity. 
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8.2  A  Weak  Ergodicity  Bound 
The  following  paragraphs  propose  and  then  verify  a  mutation  probability  parameter 
bound  sufficient  to  ensure  that  the  Markov  chain  of  the  corresponding  nonstationary  sim- 
ple genetic  algorithm  is  weakly  ergodic  (Definition  All).  The  bound  applies  to  both  the 
two  and  three-operator  algorithms,  and  it  appears  in  Proposition  8.1  below. 

Proposition  8.1:  The  mutation  probability  bound  given  by 

pjk)>^k"^ 

is  sufficient  to  ensure  weak  ergodicity  of  the  corresponding  nonstationary  (two  or  three- 
operator)  simple  genetic  algorithm  Markov  chain. 


This  result  is  established  by  using  the  lower  bounds  on  the  two  and  three-operator 
conditional  probabilities  in  Eq.  4.21  and  4.31  with  Definitions  All  and  A 12  and  Theo- 
rems A5  and  A6.  Applying  the  lower  bound  in  Eq.  4.21  and  4.31  to  T,()  of  Definition 
A12  and  Theorem  A5  yields 

'C,(P)  =  1  -  min  Z  min(P(m  |  n,),P(m  |  n^)) 


Thus, 


<l-Z 


=  1 


M^^    -    ^"^ 


m 


a 


1  -i-a 


a    1    yI  M 


1  +a 


m 


1- 


=  1 


a 
1-i-a 

2a 
1  +a 


/ 
NMI. 


,ML 


(l-x,(P))> 


2a 

1  +a 
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and  consequently  from  Theorem  A6,  the  chain  is  weakly  ergodic  if  the  sequence  of  con- 
trol parameter  values  {a(k;)}  satisfies 

-(  2a(k)  r 
k=il^l-i-a(k)^ 

Comparing  this  result  to  the  known  divergent  series  Zk"\  it  follows  that  the  Markov 
chain  is  weakly  ergodic  if  the  sequence  {a(k)}  satisfies 

(  2a(k)   Y^ 


l+a(k)J 
from  which 

f    a(k)    ^ 


>k-\ 


>-k"^. 

2 


l+a(k)y 

Using  Eq.  4.13  to  translate  this  result  into  an  equivalent  expression  in  p„(k)  establishes 
Proposition  8.1. 

8.3  Strong  Ergodicity 

The  mutation  probability  schedule  bound  advanced  in  Proposition  8.1  is  also  suffi- 
cient to  achieve  strong  ergodicity  if  it  satisfies  the  condition  on  the  sequence  of  vector 
differences  in  Theorem  A7.  The  required  sequence  of  vectors  can  be  selected  as  the 
sequence  of  stationary  distributions  of  the  time-homogeneous  Markov  chains  associated 
with  the  parameter  sequence  {Pn,(k)}  (or  equivalently  with  the  corresponding  sequence 
{a(k)}). 

Section  4  establishes  that  a  stationary  distribution  exists  for  the  time-homogeneous 
two  and  three-operator  algorithms  corresponding  to  every  value  of  a  satisfying  a  >  0. 
Thus,  associated  with  the  sequence  of  control  parameter  values  {a(k)}  is  a  sequence  of 
vectors  {q^}  where  q^  =  qa  evaluated  at  a  =  a(k).  Further,  based  upon  results  established 
in  Section  6,  Section  7  demonstrates  that  an  a  — >  0"^  limiting  stationary  distribution  exists 
(Propositions  7.5  and  7.6),  that  the  stationary  distribution  vector  varies  continuously  for 
all  a  satisfying  a  >  0  (Proposition  7.7)  and  that  its  first  derivative  exists  and  is  continuous 
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for  all  a  satisfying  a  >  0  (Proposition  7.3).  In  particular,  q^  is  continuous  on  the  closed 
interval  0  <  a  <  1  and  its  first  derivative  exists  at  every  interior  point  of  that  interval. 
Therefore,  if  consideration  is  limited  to  monotone  decreasing  control  parameter 
sequences,  then  by  the  mean  value  theorem  the  difference  between  the  m  components  of 
any  two  consecutive  vectors  in  the  sequence  can  be  written  as 

dq„(m)' 


qk+i(m)-qk(m)  = 


da 


(a(k+l)-a(k)) 


-la  =  a  (k) 


where  the  value  a*(k)  satisfies  a(k  +  1)  <  a*(k)  <  a(k).  Consequently, 

dq„(m)' 


lqk+i(m)-qk(m)|  = 


da 


a  =  a  (k) 


x|a(k+l)-a(k)| 


and 


S|qk  +  ,(m)-qk(in)|  =  I 


k=l 


k  =  l 


dq»(m) 
da 


a  =  a  (k) 


x|a(k+l)-a(k)|k       Eq.  8.1 


From  Propositions  7.3  and  7.8,  it  is  possible  to  define  a  function  ga(m)  which  is 
continuous  in  a  on  the  closed  interval  0  <  a  <  1  as  follows 

dq„(m) 

u<as  1 

Eq.  8.2 


ga(m)  =  - 


da 
dq„(m) 


lim 
a^o*    da 


0<a<l 
a-0 


Then,  from  a  fundamental  theorem  in  the  calculus  of  functions,  it  follows  that  gjjn)  (^"d 
consequently  that  |  ga(m)| )  is  bounded  on  the  closed  interval  0  <  a  <  1.  Thus,  if 


B=_    sup      |ga(m)|, 

m€  S',a€  |0, 1) 

then  it  follows  from  Eq.  8.2  that  at  every  interior  point  of  the  interval  0  <  a  <  1 

dq„(m) 


Eq.  8.3 


da 


<B, 


and  application  of  this  result  to  Eq.  8.1  yields 


Ilqk+i(m)-qk(m)|  =  Z] 

k=l  k=l 
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dqa(m) 
da 


-la  =  a  (k) 


x|a(k+l)-a(k)| 


<  Z  Bx|a(k+l)-a(k)| 

k  =  l 


=  B  Z|a(k+l)-a(k)|. 

k  =  l 


Eq.  8.4 


Since  only  monotonic  control  parameter  sequences  are  under  consideration,  the  sum 
in  the  last  line  of  Eq.  8.4  can  be  written  as  the  difference  of  the  initial  and  final  parameter 
values  of  the  sequence.  Thus, 


I|q,,,(m)-q,(m)|  <B(a(l)-a(oo)) 

k=l 

=  B(l-0) 

=  B<oo. 


Eq.  8.4 


The  series  of  vector  differences  required  for  application  to  Theorem  A7  can  then  be  writ- 
ten as 


Ilqk  +  i-qkl,=  S_I  |qj,^,(m)-qi^(m)| 


k=l 


k  +  1  m€  S' 

<   I  B 

me  S" 

=  N'B<oo. 


Eq.  8.5 


Applying  the  combined  results  of  Proposition  8.1  and  Eq.  8.5  to  Theorem  A7  pro- 
duces the  goal  of  this  section. 


Proposition  8.2:  The  mutation  probability  bound  given  by 

pjk)>^k'^ 

is  sufficient  to  ensure  strong  ergodicity  of  the  corresponding  Markov  chain.  Further,  the 
Markov  chain  representing  any  nonstationary  two  or  three-operator  simple  genetic 
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algorithm  for  which  the  mutation  probability  sequence  both  observes  this  bound  and  con- 
verges to  zero  achieves  (asymptotically)  the  limiting  probability  distribution  defined  in 
Propositions  7.5  and  7.6. 

8.4  Comparison  With  the  Simulated  Annealing  Parameter  Bound 
It  is  instructive  to  compare  the  mutation  probability  sequence  bound  developed  here 
with  the  anneaUng  schedule  bounds  reviewed  in  Section  2.4.2.2,  both  of  which  are  of  the 
form  K/log(k).  Let  p(k)  be  defined  as  the  ratio 

p(k)  =  p„(k)/T(k) 

where  Pm(k)  is  selected  as  the  bound  developed  herein  and  T(k)  is  selected  as  the  bound 
provided  by  either  Eq.  2. 1 2  or  Eq.  2.13.  That  is 

p(k)  =  ^k''^/[K/log(k)]  Eq.8.5 

=  ^log(k)/k^. 

Thus,  decreasing  values  of  p(k)  imply  that  the  genetic  algorithm  convergence  rate  is 
superior  (asymptotically)  to  that  of  the  simulated  annealing  algorithm. 

Now,  let  k  =  exp(x),  or  equivalently  x  =  log(k).  Substituting  into  Eq.  8.5  yields 

P(k)  =  ^xexp[-^   . 

Then,  since  for  all  positive  constants  y,  the  limit  of  xexp(-Yx)  as  x  -^  oo  is  zero,  it  follows 
that 

lim  p(k)  =  ^  li>^ilog(k)/k^]  =  0-  Eq.  8.6 

Thus,  the  nonstationary  simple  genetic  algorithm  provides  an  asymptotically  superior 
convergence  rate. 


SECTION  9 

REPRESENTATION  OF  THE  STATIONARY  DISTRIBUTION  SOLUTION 

9.1  Overview 

Previous  sections  of  this  work  establish  some  key  results  required  for  extrapolation 
of  the  simulated  annealing  convergence  theory  onto  the  nonstationary  Markov  chain 
model  of  the  simple  genetic  algorithm.  Specifically,  existence  of  a  unique  stationary  dis- 
tribution for  the  time-homogeneous  two  and  three-operator  algorithms  is  established  in 
Sections  4.3.2  and  4.3.3,  and  in  Section  6,  the  existence  argument  is  formulated  into  a 
Cramer's  rule  expression  for  the  stationary  distribution  components.  Sections  7  and  8 
continue  that  development  by  establishing  the  existence  of  a  limiting  distribution  as  the 
mutation  probability  parameter  approaches  zero  and  a  mutation  probability  sequence 
bound  sufficient  to  achieve  it.  However,  the  empirical  results  in  Section  5.5  suggest  a 
complication,  confirmed  in  Section  7,  associated  with  the  form  of  the  limiting  distribu- 
tion. The  limiting  distribution  behavior  necessary  for  extending  the  simulated  annealing 
global  optimality  result  does  not  obtain  because  the  Umiting  distribution  is  nonzero  for  all 
states  with  uniform  population  (one-operator  absorbing  states),  including  those  for  sub- 
optimal  solutions.  The  limiting  distribution  entropy  results  reported  at  the  conclusion  of 
Section  5.5  support  the  intuitive  notion  that  increasing  the  population  size  parameter 
~~^   should  bias  the  limiting  distribution  toward  the  desired  behavior.  However,  to  pursue  that 
notion  further  requires  closer  examination  of  the  stationary  distribution  equations  and  the 
requirements  for  their  solution.  This  section  begins  that  task.  It  is  a  very  extensive  devel- 
opment and  it  stops  short  of  explicit  solution.  However,  it  provides  some  insight  into  the 
nature  of  the  solution  and  additionally,  it  defines  a  promising  approach  to  continuing  the 
work  started  here. 
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The  essential  task  of  representing  the  stationary  distribution  solution  consists  of 
evaluating  the  determinants  required  to  express  the  results  of  Propositions  6.7-8  and  their 
limiting  counterparts  in  Propositions  7.5-6.  The  development  proceeds  by  examining  the 
three  distinct  cases  which  arise  from  applying  three  different  sets  of  constraints  on  the 
value  of  the  mutation  probability  parameter.  The  special  case  p„  =  1/2  <=>  a  =  1  is 
examined  in  Section  9.2.  It  leads  to  a  very  simple  (trivial)  result  that  is  of  no  particular 
interest  in  its  own  right,  but  is  fundamental  to  the  mechanism  employed  in  Section  9.3  in 
developing  the  more  general  case  0  <  a  <  1.  The  approach  pursued  in  Section  9.3 
involves  expanding -|Psi-I|  =|P-I|  -|Ps-Ii  as  a  multivariate  Taylor's  series  in  the 
N'  X  N  array  of  conditional  probabilities  [P(i  |  n)]  for  i  e  S  and  n  €  S'  (defined  by  Eq. 
4.14  for  the  two-operator  algorithm  and  by  Eq.  4.22  for  its  three-operator  counterpan) 
about  the  point  corresponding  to  a  =  1.  The  product  of  that  effon  is  an  expression  for  the 
coefficient  of  the  general  term  of  the  series  as  a  determinant  with  combinatorial  elements. 
The  case  p^  -^  0^  <=>  a  ^  0^  is  examined  in  Section  9.4.  The  methodology  developed  in 
Section  9.3  extends  with  very  little  modification  to  represent  the  a  ^  0^  limiting  behav- 
ior of  I  Ps^- 1| .  Section  9.5  concludes  by  pointing  out  some  significant  identities  which 
exist  among  the  Taylor's  series  coefficients  and  the  connection  of  those  identities  to  the 
algebra  of  symmetric  and  alternating  polynomials.  Its  purpose  is  to  provide  a  foundation 
for  extending  the  stationary  distribution  representation  work  begun  here. 

9.2  The  Limiting  Case  a  =  1 

As  pointed  out  in  Section  6,  the  determinants  required  for  expressing  the  value  of 
the  stationary  distribution  components  by  Propositions  6.7-8  are  the  characteristic  poly- 
nomials of  the  P^  matrices  evaluated  at  X  =  1 .  The  coefficients  of  the  characteristic  poly- 
nomial of  any  square  matrix  X  with  finite  dimensions  can  be  expressed  in  tenns  of  the 
principal  minors  of  |  X|  (i.e.  minors  generated  from  |  X|  by  deleting  combinations  of  rows 
and  columns  with  the  same  indices).  For  example,  the  characteristic  polynomial  of  P  can 
be  expressed  as 
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=  Afj.  -  A^.  _  1^  +  A^,.  _  2^.^  -  A^,.  _  3^,^  Eq.  9. 1 

+--  +  (-lf-'A,Jl^'-'  +  (-lfAo?^'^' 

u 

where  N'  =  card(S')  is  the  dimension  of  P  and  A^  is  the  sum  of  its  order  u  principal 
minors.  This  resuh  is  fundamental  to  the  theory  of  square  matrices  and  follows  from 
application  of  elementary  determinant  expansion  operations  to  |  P  -  7d\  [Aitk54,  MoSt64, 
Muir60].  Exactly 


N' 
u 


N'! 


y     u!(N'-u)! 
order  u  principal  minors  are  summed  to  produce  A^.  The  values  of  some  of  the  Au's  are 


A  ..= 


I  P|         u  =  N' 
trace(P)      u  =  1  Eq.  9.2 

1  u  =  0 


where  the  A,,  result  follows  from  the  convention  that  the  single  order  zero  principal  minor 
of  |X|  has  value  1. 

In  a  fashion  exactly  analogous  to  Eq.  9.1,  the  characteristic  polynomial  of  P^  can  be 
written  as 

U^)  =  \P^-Xi\ 

=  aI-aI_,X  +  aI_:,X^-AI_,X'  Eq.9.3 

+---  +  {-lf-'AfX'''-'  +  i-\fA^X''' 
=  S(-lf  "X^"-" 

u 

where  A'^  is  the  sum  of  all  order  u  principal  minors  of  |  P^^l .  Thus,  the  value  of  each  deter- 
minant required  for  expressing  q  via  Propositions  6.7-8  can  be  written  respectively  in  the 
form 
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P5-I|=<l>i5(l) 


—  A     —  A         -I-  A"       —  A" 


+---  +  (-lf'-'A"  +  (-lfA^ 


N'  -  u  A  m 


and 


u 

|P-!|-|P5-!|=(})(i)-(l)s(i) 

=  (A^.-A^.)-(An.-,-A^._,)  +  (A^._,-A^,)-(A^._3-AJ._3)         Eq.9.4 
=  X(-lf-"(A„-A:). 

u 

Further,  the  A"'s  can  be  expressed  in  terms  of  the  principal  minors  of  |  P|  because  those 
principal  minors  of  |  P^|  which  include  row  m  (the  0  row)  have  value  zero  while  those 
which  exclude  row  m  are  identical  to  the  corresponding  principal  minors  of  |  P| .  In  partic- 
ular, the  Au 's  corresponding  to  Eq.  9.2  are 


a:= 


|P-|=0  u  =  N' 

trace(P-)  =  trace(P)  -  P(m  |  m)      u  =  1 
1  u  =  0 


Eq.  9.5 


Eq.  4. 1 1  and  Eq.  4.24,  it  follows  that 

Vie  S,VHe  S' :  P(i  |H)|„^,  = 

1 
"2^ 

= 

1 

JK'.J) 


ya=l 


Eq.  9.6 


The  conditional  probabilities  are  uniform  at  a=  1.  Then,  by  substituting  from  Eq.  9.6  into 
Eq.  4.15  or  4.25,  it  follows  that 
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Vm, n  e  S'  :  P(m  |  n)|  „^ ,  = 


[m 


n 

m  lis  s 


/^  1  \^^ 


2^  , 


Eq.  9.7 


^M^ 


^my 


j_ 

)ML' 


Thus,  P|  „^  1  =  [P(m  I  n)]„= 1  is  a  rank  one  matrix,  and  therefore  all  minors  of  |  P|  «=  i  of 
order  u  >  2  are  identically  zero.  Eq.  9.4  then  reduces  to 

{|P-!|-|P--!|}„^,  =  {(t)(i)-(l)s(i)}„^, 

=  (-lf-'{A,-Af}^^,  +  (-lf{Ao-An„^,, 
and  substituting  from  Eq.  9.2,  9.5  and  9.7  into  this  result  produces 


^N'-l 


{|P-I|  -|Ps-I|  }„_  =(-ir  -'{trace(P)-[trace(P)-P(m  |  m)]}„. 


=  (-lf-^P(m|m)|„., 


Eq.  9.8 


=  (-1)' 


N'-l 


fM\   1 


yVny 


>ML 


Employing  Eq.  9.8  with  Proposition  6.8  yields  an  explicit  result  for  the  a  =  1  limit- 
ing value  of  q(m),  i.e. 

|P-!|-|P5-!| 


q(m)|, 


'"  I(|P-I|-|Ph-I|) 

n€  S' 


a=l 


(-1) 


N'-l 


r 


m 


ML 


(M 


ymj 


I   (-1) 
n€  S' 


N'- 


n)  H€s\n^ 


(yC\  1 


yXXXj 


>ML* 


It  is  independent  of  the  objective  function  of  the  underlying  optimization  problem 
because  at  Pni=  1/2  <=^  a=  1  mutation  completely  nullifies  the  reproduction  operator. 

Although  this  trivial  case  is  not  of  any  particular  interest  on  its  own,  it  serves  as  the 
basis  for  developing  the  general  case  0  <  a  <  1  in  Section  9.3.  The  essential  idea  is  that  P 
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is  rank  1  at  a  =  1,  which  makes  the  low  order  derivatives  of  p  with  respect  to  the  condi- 
tional probabilities  P(i  |  n)  have  comparatively  low  rank,  and  this  suggests  expanding 
-|  Ps  - 1|  =  I P  - 1|  - 1  Ps  - 1|  in  a  multivariate  Taylor's  series  about  the  point 
corresponding  to  a  =  1.  The  result  reflected  in  Eq.  9.8  is  the  constant  term  of  the  series. 

9.3  The  General  Case  0  <  a  <  1 

The  state  transition  matrix  of  the  two  and  three-operator  algorithms  is  completely 
determined  by  the  fixed  algorithm  parameter  M  and  the  N'  x  N  array  of  conditional  prob- 
abilities [P(i  I  n)]  for  i  e  S  and  n  €  S'.  Each  element  of  row  n  in  P  consists  of  a  multino- 
mial coefficient  and  a  distinct  order  M  product  composed  of  integral  powers  of  the 
P(i  I  n)'s  corresponding  to  row  n  (Eq.  4.15  or  4.25).  Thus,  the  order  k  principal  minor  of 
I  P|  generated  by  inclusion  of  rows  K  =  {n,,n2,  •  ^n^}  c  S'  can  be  written  as  an  order 
k  X  M  polynomial  (composed  of  order  k  x  M  monomial  terms)  in  the  k  x  N  array  of  vari- 
ables [P(i  I  n)]  for  i  e  S,  n  e  K.  The  corresponding  order  k  principal  minor  of  |  P^jl  has 
identical  value  provided  m  «  K  and  is  zero  if  m  e  K.  These  facts  along  with  the  succinct 
representation  of  the  P(i  |  n)'s  as  rational  functions  of  the  objective  function  and  algo- 
rithm parameters  (Eq.  4.2,  4. 1 1,  4.24)  and  the  degeneration  of  P  to  rank  1  at  a  =  1  (Eq. 
9.7)  suggest  an  attempt  to  expand  (t)si(l)  =  I  P^- 1|  as  a  multivariate  Taylor's  series  in  the 
P(i  I  n)'s  about  the  point  a  =  1  where,  according  to  Eq.  9.6,  Vi  €  S,  Vn  e  S':P(i  |  n)  =  1/2^. 
(Actually,  expanding  the  alternative  form  (])( 1 )  -  ^(  1 )  =  |  P  - 1|  - 1  P^^  -  T|  in  a  new  array 
of  N'  X  N  variables  which  uniquely  determines  [P(i  |  n)]  proves  more  productive).  The 
constant  term  in  the  series  is  provided  by  Eq.  9.8  and  the  highest  order  terms  in  the  series 
are  the  order  (N'  -  1)  x  M  monomials  contributed  by  the  single  nonzero  order  N'  -  1  prin- 
cipal minor  of  |  P^\ . 

Let  r  =  [r(i,n)]  be  an  N'  x  N  nonnegative  integer  array  having  rows  of  the  fonn 
rs  =  (r(0,n),r(l,n),  •••,r((2'  -  1), n)).  The  nonnegative  integer  r(i,n)  represents  the  expo- 
nent of  the  factor  (P(i  |  n)  -  1/2' )  appearing  in  a  monomial  term  of  the  Taylor's  series. 
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Also,  let  ||?-|1  =  X  r(i,n)  and  ||?||  =1  \\t^\\  =11  r(i,n).  Then,  the  Taylor's  series 

i€S  neS'  neS'ieS 

expansion  of -|Pm-I|  =|P-I|  -|Ps-I|  can  be  expressed  as 

-|P--I|=|P-!|-|P5-I| 

=  I  C(m,?)  X  n   n  (P(i  I H)  -  1/2^-^ '^ 

r  ne  S'ie  S 


Eq.  9.9 


where 


__         a'^-|Piii-i|) 

C(m,r)=   ^   n  r(i,H)!a^'-^(i,H) 


n  e  S'  i  6  S 


(f), 


(-iP5-iirL= 


a=\ 


r! 


Eq.  9.10 


a=l 


a'^ip-!i-|p^-!i) 
n  nr(i,H)!a^'">p(i|H) 

n  €  S'  i  €  S 


(f), 


(ip-ii-iPH-iiri„ 


a=l 


is  the  coefficient  of  the  order  ||  r||  monomial  term  uniquely  identified  by  the  nonnegative 
integer  array  r.  In  these  expressions,  the  symbol  r!  denotes  the  operation 

?!=  n   n[r(i,H)!]. 

n€  S'i€  S 

Expressing  the  value  of  C(m,r)  thus  reduces  to  evaluating  the  indicated  mixed  partial 
derivative  of  -|  P^;  - 1|  =  |  P  - 1|  - 1  P^  - 1|  at  a  =  1  divided  by  r!. 
The  coefficient  of  the  order  ||  r||  =  ||  0||  =  0  term  is 

C(m,0)  =  (|P-!|-|P--!|fL^^  =  {|p-!|-|P,,-!|}„^, 

and  its  value  is  the  constant  term  of  the  series,  provided  by  Eq.  9.8.  The  coefficient  of  the 
first  order  monomial  term  which  results  from  setting 

1     i  =  i,,n  =  ni 


r'"=[r(i,n/'^     where     r(i,nf^  =  - 


0      otherwise 


is  given  by 


C(m,?")  = 


^,.      9(|P-I|-|Ps-I|) 


aP(i,  I  n,) 


=  (|P-I|-|P--I|)' 


(i"*) 


a=l 


a=l 
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The  associated  monomial  term  in  the  Taylor's  series  expansion  of  _|  P-- Ji  = 
|P-!|-|Ps-I|  is  given  by 

C(m,?'>)x(P(i,  |H,)-l/2'-). 

In  subsequent  paragraphs,  ||  rj  =  0  and  ||  tJ  <  M  for  n  ?i  m,  which  together  imply 
II  r||  <  (N'  -  1)  X  M,  are  shown  to  be  suitable  upper  bounds  on  the  order  of  differentiation 
with  respect  to  the  P(i  |  n)'s  when  computing  C(m,r).  Thus,  the  Taylor's  series  terminates 
at  finite  order,  as  indeed  it  must  since  -|  P5-II  is  a  polynomial  function  of  the  P(i  |  n)'s, 
and  as  noted  earlier,  the  highest  order  monomial  terms  are  order  (N'  -  1)  x  M. 

These  upper  bounds  on  the  order  of  differentiation  (i.e.  upper  bounds  on  ||  tJ  ), 
along  with  the  lower  bound  of  0  on  every  component  of  r  imposed  by  the  requirement 
that  r  be  a  nonnegative  integer  array,  can  be  represented  to  advantage  in  terms  of  a  set 
related  to  S'.  Let  S',  which  is  completely  determined  by  the  parameters  L  and  M,  be 
momentarily  represented  by  S'(M)  (that  is,  let  its  dependence  on  M  be  explicitly  indi- 
cated) and  let  the  set  S"  be  defined  as  the  set  union  of  all  S'(k)  for  0  <  k  <  M.  That  is 

S"  =  S'(0)uS'(l)u---uS'(M-l)uS'(M) 

M 

=  u  S'(k). 

k  =  0 

The  above  constraints  on  the  rows  of  r  are  then  equivalent  to  requiring  that  every  row  of? 
be  drawn  from  S",  with  the  additional  requirement  that  row  m  be  the  specific  element 
rjs  =  0  6  S".  Since  the  S'(k)  for  distinct  k  are  disjoint,  it  follows  that  N"  =  card(S")  is  the 
sum  of  the  N'(k)  -  card[S'(k)l,  and  consequently  from  Eq.  4. 1  and  an  elementary  recur- 
sion on  the  binomial  coefficient  that 

[     M     j     [    M    )■ 

This  result  is  precisely  that  supplied  by  Eq.  5.1,  accompanying  the  state  space  enumer- 
ation empirical  results  tabulated  in  Section  5.2. 
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Since  p_  is  independent  of  the  P(i  |  m)'s  associated  with  row  m  (row  m  of  p_  is  set 
to  0  ),  it  follows  that  no  monomial  term  containing  any  of  the  factors  (P(i  |  m)  -  1/2  ) 
appears  in  the  expansion  of  -|  Ph  - 1|  ■  Further,  since  |  P  -  T|  - 1 P^  -I|  =  -|  P^  - 1| ,  no  such 
monomial  terms  appear  with  nonzero  coefficient  in  the  expansion  of  |  P- 1|  - 1 P^- 1| 
either.  (Due  to  the  constraint  Vn  e  S'  :   Z  P(i  I  n)  =  1,  the  aggregate  of  such  terms 

i€  S 

appearing  in  |  P- 1|  is  identically  zero).  This  observation  estabhshes  the  ||  r^H  =  0  bound 
and  permits  the  following  revision  of  the  Eq.  9.10  definition  of  C(m,r) 


C(m,r)  =  . 


(|P-!|-|PH-!|f| 


a=l 


r! 
0 


l|rdl=0 
otherwise 


Eq.9.11 


The  derivative  of  an  order  n  determinant  with  respect  to  a  variable  x  can  be  written 
as  the  sum  of  the  n  determinants  generated  by  differentiating  each  row  (or  column)  in 
turn  with  respect  to  x  [Aitk54,  MoSt64,  Muir60].  For  example,  if 


A  = 


^11        ^12 


^1        ^2 


then 


d|A| 
dx 


da,,     da,2 
dx       dx 


•'21 


^2 


'12 


da2i     da22 
dx       dx 


If  the  elements  of  any  row  in  the  given  determinant  are  independent  of  x,  then  dif- 
ferentiation of  that  row  introduces  an  all  zero  row  and  the  value  of  the  corresponding 
determinant  is  zero.  In  particular,  if  only  one  row  of  the  given  determinant  depends  upon 
X,  then  only  one  nonzero  determinant  appears  in  the  row-derivative  expansion. 

Higher  order  and  mixed  partial  derivatives  of  an  order  n  determinant  can  be 
expressed  similarly,  e.g. 
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aUi 


9x^3x2 


d\ 


d\ 


12 


^1  ^2 


a^a,,    a^£ 


12 


ax?     ax? 
aaji     aa22 


+2 


a^a,,     a^a,2 


d\,dy 


da 


21 


ax, 3x2 
aa22 


ax, 

■  ■      ^ai2 

ax,     ax. 


ax, 
aa, 


+  2 


ax, 
a^c 


'21 


3X2  3X2 

^a,2 
3x, 

3'a 


22 


ax, 3x2     3x,3x2 


3^a2,     3^a 
"3^ 


22 


3x? 


3'a. 


21 


3'a 


12 

22 


3x^3x2     3x?3x2 


and  again,  differentiation  of  any  row  with  respect  to  a  variable  upon  which  it  does  not 
depend  introduces  an  all  zero  row.  Thus,  if  in  the  preceding  result  the  first  row  of  A  is 
independent  of  X2  and  the  second  of  x,,  then  only  one  of  the  determinants  in  the  expan- 
sion survives 

3'a,,     3'a 


3x^3x2 


12 


3x? 


3x? 


'1       <^^i 
3a2,      3a 


22 


3x,      3x, 


Since  each  P(i  |  n)  appears  in  only  one  row  of  -|  Ps  - 1|  =  |  P  - 1|  - 1  P^  - 1| ,  it  fol- 
lows from  application  of  the  preceding  determinant  differentiation  rules  that  the  mixed 
partial  derivative  |  P- 1|  '  can  be  written  as  the  single  determinant  (indicated  hereafter  by 
I  (P  - 1) '  I  =  I  P '  -  r  I )  generated  by  differentiating  the  rows  of  the  matrix  (P  - 1)  in  accor- 
dance with  r  and  then  computing  the  determinant  of  the  matrix  derivative.  Tliat  is,  due  to 
the  single-row  dependence  of  (P- 1)  on  each  P(i  |  n),  the  two  operations  involved 
(differentiating  (P- 1)  and  evaluating  its  determinant)  commute.  The  same  conclusion 
applies  to  any  mixed  partial  derivative  of  |  Ps-Il  with  respect  to  the  P(i  |  n)'s,  and  hence 
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(|P-!|-|P5-!|f  =  |P-I|^-|PH-i|^ 


=  l(P-lf|-|(Psi-if|  Eq.9.12 

^ip(f)_j(f)|  _|p«_j(f)| 

'  '         I     ID  r 

Eq.  9.4  can  be  generalized  to  express  the  value  of  (|  P- 1|  - 1 P^;- 1| ) '  as  indicated 
in  the  following.  Let  r  have  k  <  N'  -  1  rows  which  specify  nonzero  order  differentiation, 
let  K  =  {n,,n2,  •■•,nk}  c  S'  be  the  set  of  differentiated-row  indices  and  further  let  m  g  K. 
Also,  for  N'  >  u  >  k  let  Au(r)  be  the  sum  of  all  order  u  principal  minors  of  |  P'  |  formed  by 
including  the  k  differentiated  rows  indicated  by  n  €  K  and  u  -  k  of  the  N'  -  k  undifferen- 
tiated rows  in  I  P' I .  Exactly 


N'-k^l  (N'-k)! 


i,u-k;     (u-k)!(N'-u)! 
order  u  principal  minors  are  summed  to  produce  Au(r).  Finally,  let  A[f^(r)  be  defined  simi- 
larly for  I  Pj^l .  Then,  applying  the  same  elementary  determinant  expansion  rules  that  lead 
to  Eq.  9. 1  and  Eq.  9.3  to  |  P^''  -  Xf)  and  |  P^  -  Xf)  yields 

u  =  k 

and 


and  substituting  these  results  into  Eq.  9.12  with  X  =  1  yields  the  differentiated  analog  of 
Eq.  9.4 

(|P-I|  -IPs-Ilf  =  |P^-I^|  -K-fl  Eq.9.13 

=  I(-ir-"(AXr)-A:(r)). 

u  =  k 

If  I  P'  I K  is  the  order  k  principal  minor  of  |  P'  |  uniquely  defined  by  the  set  of 
row/column  indices  K  =  {n,,n2,  ••■.n,;}  cS'  where  m  «  K  and  if  |  P''|kj  is  the  order  k-l- 1 
principal  minor  generated  by  including  the  undifferentiated  row  n  e  S'  -  K  with  K  (i.e. 
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K5  =  Kun  =  {n„n2,---,nk,n)  cS'),  then 


A„(?)  = 


IP^Is-IP^'l 

n€  S'-K 


K5 


IP^Ik 


u  =  N' 
u  =  k+l 

u  =  k 


Eq.9.14 


Also,  since  every  principal  minor  of  |  P^l  is  either  identical  to  the  corresponding  principal 
minor  of  |  P'|  or  0  depending  upon  whether  or  not  it  includes  row  m,  and  since  m  £  K  by 
hypothesis,  it  follows  that  the  A"(r)'s  corresponding  to  Eq.  9.14  are 


a:(?) = 


I      m  Is'         I     m  I 
,Wf), 


u  =  N' 


=;(?), 


I       |FV  =  A,,,(?)-|P*V     u  =  k+l 

neS'-K-{m}  ^  ^ 

IP^'Ik  u  =  k 


Eq.  9.15 


All  of  the  N'  -  k  undifferentiated  rows  in  p^|  ^^  j  are  identical  (Eq.  9.7)  and  conse- 
quently all  minors  of  P^''|  „^ ,  of  order  u  >  k  +  2  have  value  0.  Thus,  in  a  fashion  exactly 
analogous  to  the  derivation  of  Eq.  9.8  from  Eq.  9.4,  Eq.  9.13  yields 

{(|P-i|-|P5-i|)1„=,  =  Hf"'"'{A,.,(?)-Af.,(?)}„^,  +  (-lf-''{A,(-r)-A:(-r)}^^,, 
and  substituting  from  Eq.  9.14  and  9.15  into  this  result  produces 

~"^     ''         Eq.  9.16 


(|P-i|-|P--!|f}„^,  =  (-i) 


N'-k-l 


A,.,("r)-  A,,,(?)-|P^1 


a=l 


^(f), 


from  which,  by  substitution  into  Eq.  9.1 1,  it  follows  that 


C(m,?)  = 


(-1) 


,N'-k-l 


!i^%}. 


r! 
0 


ry  =0 

otherwise 


Eq.9.17 


Evaluating  C(m,r)  thus  requires  evaluating  the  quotient  of  the  order  k  +  1  principal  minor 
||P''kl„=.and-r!. 

The  order  k-t- 1  principal  minor  |  P'L   is  completely  determined  by  the 
(k  +  1)  X  (k  +  1)  sub-array  of  P*"'*  given  by  |P^'(w  |  v)|  for  w,  v  e  K^  where  F^'\w  \  v) 
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denotes  the  indicated  mixed  partial  derivative  of  P(w  |  v).  Further,  in  computing  the  deter- 
minant of  this  sub-array,  the  order  in  which  the  row/column  indices  are  drawn  from  K^  in 
the  sub-array  construction  is  immaterial  because  any  transposition  of  the  order  introduces 
exactiy  one  row  transposition  and  one  column  transposition  into  the  sub-array,  so  both 
the  magnitude  and  the  algebraic  sign  of  its  determinant  are  preserved.  Thus,  the  most 
general  form  of  |  P '  |  k^  can  be  expressed  as 


IP^'l 


^' 


P^\m  I  m)  P^(n 
P^(m  I  n,)  P^\n 
P<^(m  I H,)    P^(H 


P^'\m  I  n,)     P^'\n 


|m)     P^^\Hjm)     • 
|H,)    P<^(H2|H,)     • 


P^'\n,  I  m) 
P^>(HJH,) 
P^K  I  n^) 


Eq.9.18 


i(f) 


In,)     P^(n,|n,)     •••     P^V  I  n,) 

From  Eq.  4.15  and  4.25,  it  follows  that  each  nonzero  element  in  row  v  e  K^  of  P^' 
is  composed  of  a  combinatorial  coefficient  and  an  order  M  - 1|  r^H  product  of  the  P(i  |  v)'s. 
The  general  form  of  the  element  in  column  w  e  K^  of  row  v  is  given  by 


P^'\w  I  v)  =  < 


M 


n  w(i)! 

ie  S  r-r    ^..   ,  — .w(i)-I<i,V) 


wj  n(w(i)-r(i,v))!ies 


n  P(i  I  v)^ 


ie  S 


0 


Vie  S:w(i)>r(i,v) 
otherwise 


which  can  be  rewritten  as 


Eq.9.19 


P^(w|v)  =  Jl  w 


i€  S 


nw(i)!x(n[r(i,v)!]P(i|v) 


w(i)-i<i,v) 


e  S 


nr(i,v)!(w(i)-r(i,v))! 

ie  S 


0 


Vi  e  S:w(i)  >r(i,v) 


otherwise 


Further,  by  noting  that  the  factor 
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'M 
M! 


n  w(i)! 

i€  S 


nr(i,v)!(w(i)-r(i,v))! 

i€  S 


n  w(i)! 

i€  S 


n  w(i)! 

i€  S 


nr(i,v)!(w(i)-r(i,v))! 

ie  S 


■     Vie  S:w(i)>r(i,v) 


M! 


nr(i,v)!(w(i)-r(i,v))! 

i€  S 


is  a  multinomial  coefficient  and  designating  it  (via  straightforward  generalization  of  the 
convention  introduced  in  Eq.  4.4)  by 

M! 


^M^ 


y^,r,, 


Eq.  9.19  simplifies  to 


n(w(i)-r(i,v))!r(i,v)! 

i€  S 

0 


^M^ 


Vie  S:w(i)>r(i,v) 


otherwise 


Eq.  9.20 


P^)(w|v)=   --     n[r(i,v)!P(i|v)*<""<''^. 


yW.rvy 


i€  S 


Eq.  9.21 


If  row  V  is  undifferentiated  (i.e.  ||  r;;||  -  0),  then  Eq.  9.21  becomes 


P^(w  I  v)  = 


M 


w 


nP(i|v)^'>  =  P(w|v). 

i€  S 


It  is  noted  in  passing  that  if  M  <  N  =  2^  then  it  follows  from  Eq.  9.20  that  either 

r  M  ^ 


vw,r^y 


0 


or 


Eq.  9.22 


3w'e  S'3 
In  the  latter  case,  it  is  also  true  that 

w' 


^M^ 


V^'^vy 


fM^ 


vw  J 


> 


^M^ 


vWy 
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The  enabling  condition  imposes  no  practical  limitation  because  any  algorithm  with 
M  >  N  could  be  effectively  supplanted  by  exhaustive  search  over  S. 

Since  I  P' I  k^  includes  every  row  of  P'  which  is  differentiated  to  nonzero  order  (e.g. 
Vn  e  S'  9 II  rjlj  >  0),  it  follows  from  Eq.  9.18,  Eq.  9.20  and  Eq.  9.21  that  any  row  n  for 
which  II  r^ll  >  M  introduces  an  all  zero  row  into  I P '  I  k^,  making  both  |  P '  |  k^  =  0  and 
C(m,r)  =  0.  Therefore,  ||  r^H  <  M  represents  a  suitable  upper  bound  on  ||  r^H  for  n  v^  m.  This 
bound,  along  with  the  previously  established  condition  ||  r^||  =  0,  implies  ||  r||  < 
(N'  -  1)  X  M  and  permits  the  following  revision  of  the  Eq.  9. 17  definition  of  C(m,r) 

C(S,;)  =    ^i ^^     II  "-J  =  0,  II  r;||  <  M  for  n  *  m 

otherwise 


r! 
0 


Further,  the  conditions  in  this  result  can  be  expressed  in  terms  of  S",  yielding 
C(m,r)  =  - 


(_l)N.-.-.j|pfi,^l 


a=l 


r! 
0 


r-  =  0,r5e  S"-{0}forn?^m        Eq.  9.23 
otherwise 


At  a=  1,  using  Eq.  9.6  in  Eq.  9.21  yields  - 


M 


V^.Tvy 


P^(w|v)L..=   --r     n[r(i,v)!(l/2r""''] 


IE  S 


Eq.  9.24 


--    (1/2^)  n[r(i,v)!]. 

^W,r-^  ieS 


Thus,  every  element  in  row  v  e  K^  of  |  P  |  k_  includes  the  constant  factor 


M-|r-| 


(1/2^)  n[r(i,v)!]. 


ieS 


Substituting  Eq.  9.24  into  Eq.  9.18  and  collecting  these  common  row  factors  outside  the 
determinant  yields 
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IP^I 


J       =(l/2'-r^'^""xLn    nr(i,v)!" 


( 


M  "j      f  M  ^     ("  M 


l"-''"sy 


M  ^     r  M  ^     f 


V 


^  M   W 


"i.ri 


V 


V 
M   ' 


M   ' 

V  ^  "ly 
^  M  ^ 


n2.rj 


r  M  ^ 


^y 


Also,  since  r=  =  0 


M   "i 


-n.; 


M   ^ 


vw,rs^ 


VW,Oy 


M 

^  M   ^ 
M 


r  M 


V 


and  since    0    O^i,  v)!=r!, 

W  y  V  e  K-  i  e  S 


\"  \y^        =(1/2  )  xr!x 

T"J  a=l 


^M^       (yC\      Tm^ 


m 


r  M 


ni,r- 

V      ">y 


and  substitution  into  Eq.  9.23  yields 


v">y 

(   M   ^ 


"l.^H 


■y 


V"2y 
M 

n2'rH, 


^  M  ^ 
ni,r- 

'  M   ^ 

M 

f   M    ^ 


r  M  ^ 

r  M  ^ 

(   M   ^ 

{   ^    1 
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C(m,r)  =  (-1)  x(l/2) 


Eq.  9.25 


M 


^     f   M   \     {  M  ^ 


m,r- 


n,,r- 


^"-^v 


^M^  f  M  ^  f  M  ^ 


^"2,\; 


(  M  ^ 
'  M   ^ 


m,r-         n,,r-  n^.r- 


M 

nv,r- 


Note  that  the  condition  r^^  =  0  is  implicitly  asserted  in  this  result  by  the  form  of  the  first 
row  of  the  combinatorial  determinant,  and  that  the  condition  r^  £  S"  -  {0}  for  n  ^t  m  is 
enforced  by  the  definition  in  Eq.  9.20. 

When  Eq.  9.25  is  employed  with  Eq.  9.9,  an  additional  simplification  becomes 
available.  The  simplification  obtains  by  incorporating  the  factor 

(1/2^"  =2^"" 
present  in  the  Eq.  9.24  definition  of  C(m,r)  with  the  product  factor  in  Eq.  9.9.  That  is 
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Eq.  9.26 


where 


=  ZC'(m,?)x  n   n(2^P(i|n)-l) 

r  n€  S'ie  S 


C'(m,r)  =  C(m,r)x(l/2'-)"' 


Ki.5) 


is  the  coefficient  of  the  indicated  monomial  in  the  new  variables.  Substitution  of  Eq.  9.25 
into  this  expression  for  C'(m,r)  yields 
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The  special  case  r  =  0<=>k  =  0<=>K5;=  {m}  of  this  result  is  C'(m,0),  the  coefficient  of 
the  single  order  0  (constant)  monomial  term  previously  represented  by  Eq.  9.8. 
Eq.  9.26  and  9.27  represent  an  exact  expression  of  the  value  of -|  Ph  - 1|  = 
I P- 1|  - 1  Pg- 1| ,  and  with  Propositions  6.7  and  6.8  constitute  an  exact  representation  of 
the  components  of  the  stationary  distributions  of  the  two  and  three -operator  algorithms. 
The  utility  of  the  representation  depends  upon  the  succinct  representation  of  the  compo- 
nents of  the  array  [P(i  |  n)]  as  rational  functions  of  the  algorithm  parameters  and  objective 
function  (Eq.  4.2,  4.1 1,  4.22  and  4.24)  and  the  ability  to  extract  useful  relationships 
between  the  C'(ni,r)'s  from  the  general  form  represented  by  Eq.  9.27.  Section  9.5  below 
points  out  some  key  coefficient  identities  related  to  the  latter  task,  but  first  the  results  of 
this  section  are  extended  to  include  the  limiting  case  represented  by  Propositions  7.5  and 
7.6. 

9.4  The  Limiting  Case  a  -^  0^ 

The  matrices  introduced  in  Section  7.4  for  use  in  evaluating  the  stationary  distribu- 
tion zero  mutation  probability  limit,  P(mA)'  and  P^^\  are  very  similar  in  fonn  to  P  and 
Pg  .  Specifically,  P(mA)'  is  a  row  stochastic  matrix  and  P^  '  is  generated  from  it  by 


no 


setting  row  hia  to  qT.  Further,  by  virtue  of  their  construction  (Eq.  7.7  and  7.10),  the  single 
row  dependence  of  the  matrix  elements  on  the  conditional  probability  array  [P(i  |  n)] 
employed  in  developing  the  results  in  Section  9.3  for  P  and  P^^  applies  to  PChIa)'  and  P^;^' 
as  well.  Thus,  Eq.  9.26-27  should  extend  with  very  littie  modification  to  the  determinants 
-| Pg^'  - r  I  =  I  P(mA)'  - I'l  -  |P5^'  -I'  |,  whose  zero  mutation  probability  limits  are 
required  by  Propositions  7.5-6.  The  following  paragraphs  highlight  the  required  modifi- 
cations and  employ  the  result  to  examine  two  simple  examples. 

In  the  a  — >  0^  counterpart  of  Eq.  9.26-27,  m  is  limited  to  membership  in  the  set  of 
one-operator  absorbing  states  (i.e.  m  =  m^  e  S^'),  a  consequence  of  which  is  that 


V"^Ay 


=  1. 


Also,  all  rows  of  the  determinant  other  than  m^  cortespond  to  nonabsorbing  states  (i.e. 
n  e  K  c  S'  -  S^').  Thus,  the  determinant  order  is  N'  -  N  -(- 1  and  the  differentiation  index 
array  is  order  (N'  -  N  -t- 1)  x  N  with  rows  corresponding  to  row  indices 
n  e  S'  -  Sa'  -f-  {m^}.  The  rows  of  r  are  limited  to  r^  =0  and  r^  €  S"  for  n  €  S'  -  S^.  Fur- 

A 

ther,  if  r  indicates  nonzero  order  differentiation  of  any  rows  which  are  adjacent  to  one- 
operator  absorbing  states,  then  the  associated  columns  of  the  combinatorial  determinant 
must  reflect  the  coefficient  contribution  from  the  adjacent  absorbing  state.  Thus,  if 
Co'(niA>r)  denotes  the  limiting  counterpart  of  C'(m,r)  in  Eq.  9.27  and  if 
K  =  {n,,  nj,  •  •  ■,  ni.}  c  S'  -  S^'  where  (in  the  state  adjacency  notation  introduced  in  Section 
7.3)  n,  e  S(nAj)'  *  S(mA)'  and  where  n2,---,n^  all  satisfy  nj  g  S^",  then  the  coefficient  of 
the  order  k  monomial  term  uniquely  identified  by  r  is  given  by 
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Eq.  9.28 
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It  is  noted  that  Eq.  9.28  is  only  an  example,  not  a  definition.  It  must  be  adjusted  based 
upon  r  to  reflect  the  number  and  location  of  the  nonzero  adjacent  state  contributions. 

The  values  of  the  determinants  -jPs^'  - 1'  |  =  I  P(niA)'  - 1'|  -  |Ps^'  - 1'  |  are  given  by 
employing  Eq.  9.28  in  Eq.  9.26  (with  r  restricted  as  noted  above).  Further,  the  a  ->  0* 
limits  of  the  -|Pm/  -I'  |  =  |  P(mA)'  - 1'|  -  |Ps^'  -I'  |  are  provided  by  using  the  a  -^  0^  lim- 
its of  the  factors  (2^P(i  |  n)  -  1)  in  Eq.  9.26.  Those  limits  are  provided  by  using  either 
P,(i  I  n)  or  PxXi  I  n)  depending  upon  whether  the  two  or  three -operator  case  is  under 
consideration. 

It  is  instructive  to  apply  these  results  to  a  simple  example.  The  following  para- 
graphs do  so  for  the  one-bit  problem  with  population  size  2.  These  parameters 
(L=  1,M  =  2)  imply  that  S  =  {0, 1 },  N  =  2,  S'  =  {(20),(1 1),(02)},  N'  =  3, 
Sa'  =  {(20), (02)}  and  S'  -  Sa'  =  {(11)}.  Thus  r  is  limited  to 
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and  the  combinatorial  determinant  required  for  evaluation  of  the  nonzero  order 
Co'(mA,r)'s  for  m^  =  (20)  by  Eq.  9.28  has  the  general  form 
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Evaluation  of  the  zero  order  coefficient  proceeds  as  follows 
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The  coefficient  corresponding  to  r,,  =  (10)  is  given  by 
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and  similarly  for  r,,  =  (01), 


C  ' 

^0 


(20), 


V 


01 
,00, 


16 


( 


(20),  (01) 


_J_ 

"16^ 


1     3 
0    4 


(11),(01), 


+ 


(02),  (01), 


1 

8' 


In  like  fashion,  the  remaining  coefficients  are 
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With  the  required  coefficients  provided  above  the  value  of  -|  Ps^'  - 1'  |  for  m^  =  (20) 
can  be  expressed  (by  Eq.  9.26)  as 


1     1 
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Eq.  9.29 


+-(2P(0|(11))-1)(2P(1|(11))-1) 

-^(2P(0|(ll))-l)'  +  -^(2P(l|(ll))-l)^ 
16  16 


Then,  since  P(0  I  11)  +  P(1  |  11)- 1  =i>  (2P(1  |  11)- 1)  =  -(2P(0  |  1 1)- 1),  Eq.  9.29  simpli- 
fies to 


(20) 
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Eq.  9.30 


=  -P(0|  11)1 
From  the  symmetry  inherent  in  the  problem,  it  follows  that  the  m^  =  (02)  counter- 
part of  Eq.  9.30  is 

-|P(02)'-n=-P(l|n)',  Eq.9.31 

and  employing  Eq.  9.30-31  with  Proposition  7.5  yields  (for  the  two-operator  case) 
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P,(0|11)' 


qo(20)  =  - 

Pi(0|  ll)'  +  Pi(l|  11)' 

and  Eq.  9.32 

qo(02)  = V ;  • 

P,(0|ll)VP,(l|llf 

Then,  substituting  Eq.  4.2  in  Eq.  9.32  yields 

qo(20)  = ; 

°  R(0)'  +  R(1)' 

and  Eq.  9.33 

qo(02)  = ; . 

The  limit  for  the  nonabsorbing  state  m  =  (1 1)  is  known  to  be  zero  by  Proposition  7.5.  An 
identical  result  to  Eq.  9.33  obtains  for  the  three -operator  case  because  for  the  one-bit 
problem,  crossover  is  nullified  and  Pj'Ci  I  n)  =  Pi(i  I  n)  (see  Eq.  4.22). 

Additional  insight  into  the  behavior  of  the  limiting  stationary  distribution  is  obtain- 
able by  examining  the  one-bit  problem  with  population  size  3.  These  parameters 
(L  =  1,M  =  3)  leave  S  and  N  unaltered  but  change  the  other  state  space  related  sets  and 
parameters  to  S'  =  {(30), (21), (12),  (03)},  N'  =  4,  S^'  =  {(30),  (03)}  and 
S'-Sa'  =  {(21), (12)}.  By  retracing  the  previous  development  (the  M  =  2case)  with? 
limited  as  indicated  by  these  state  space  sets,  results  analogous  to  Eq.  9.32  and  9.33  are 
obtained.  Thus,  the  M=3  counterpart  of  Eq.  9.32  is 


P,(0|21)^[P,(0|  12)V3P,(0|  12)^P,(1  I  12)]  + 
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and  Eq.  9.34 
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where 

D  =  P,(0|21)'[P,(0|  12)'  +  3Pi(0|  12)'Pi(l  I  12)]  + 
P,(0|  12)'[P,(l|21)'  +  3P,(0|21)P,(l|2lf]  + 
P,(l|12f[P,(l|21)'  +  3P,(l|21)'P,(0|21)]  + 
P,(l  |21)'[P,(0|  12)'  +  3P,(1  I  12)P,(0|  12)']. 
The  Eq.  9.33  counterpart  is 

(3Q.  ^  [2R(0)]'  [R(0)'  +  6R(0)'R(1)]  +  R(0)^  [R(l )^  +  6R(0)R(1 )'] 

and  Eq.  9.35 
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qo(03)  = 


D' 


where 


D'  =  [2R(0)]'  [R(0)'  +  6R(0)'R(  1 )]  +  R(0)'  [R(  1 )'  +  6R(0)R(  1  f] 
+[2R(  1 )]'  [R(  1  f  +  6R(  1  )'R(0)]  +  R(  1 )'  [R(0)'  +  6R(  1  )R(0)']. 
Again,  the  three-operator  case  yields  an  identical  result. 

These  examples  suggest  two  very  significant  conjectural  features  of  the  limiting  sta- 
tionary distribution  behavior.  First,  only  order  2  monomial  terms  survive  in  the  detemii- 
nant  expansions  of  the  M=2  case  and  only  order  6  terms  survive  for  M=3.  These  facts 
lead  to  the  supposition  that  in  general,  only  order  Mx(N'-N)  terms  survive.  In  the  M=2 
case,  Mx(N'-N)  =  2x(3-2)  =  2  while  for  M=3,  Mx(N'-N)  =  3x(4-2)  =  6.  If  this  supposi- 
tion is  correct,  then  the  polynomial  forms  required  for  evaluating  the  stationary  distribu- 
tion zero  mutation  probability  limit  by  Propositions  7.5  and  7.6  are  homogeneous  order 
Mx(N'-N)  polynomials  in  the  P(i  |  n)'s.  Presumably,  the  corresponding  property  (i.e. 
homogeneous  order  Mx(N'-l)  order  polynomial  forms)  applies  to  the  general  case  repre- 
sented by  Propositions  6.7  and  6.S. 
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A  second  conjecture  concerns  the  limiting  distribution  behavior  as  a  function  of  the 
parameter  M.  The  computed  limiting  distribution  entropy  results  displayed  in  Section  5.5 
suggest  that  the  limiting  distribution  is  dominated  by  optimal  solutions  for  M  sufficiently 
large.  That  supposition  is  supported  by  the  results  in  Eq.  9.33  and  9.35.  In  the  M=2  case, 
it  follows  immediately  from  Eq.  9.33  that  qo(02)/qo(20)  =  [R(1)/R(0)]^  For  M=3  and  R(l) 
<  R(0)  it  is  straightforward  to  show  that  a  corresponding  bounding  relationship  exists,  i.e. 
qo(03)/qo(30)  <  [R(1)/R(0)]\  This  suggests  that  the  ratio  of  the  probabilities  of  the  uni- 
form population  states  corresponding  to  i  and  j  with  R(i)  <  R(j)  behaves  at  or  better  than 

[R(i)/R(j)]''  ^  0  Eq.  9.36 

for  M  sufficiently  large.  If  this  supposition  is  indeed  correct,  then  the  desired  limiting  dis- 
tribution behavior  for  the  two-operator  simple  genetic  algorithm  (i.e.  probability  zero  for 
sub-optimal  solutions)  can  be  approached  as  closely  as  required  by  selecting  M 
sufficiently  large. 

The  corresponding  general  case  (i.e.  L>1)  three-operator  counterparts  of  Eq.  9.32 
and  9.34  are  expressed  in  terms  of  the  P2(i  |  n)'  array  (Eq.  4.22).  Thus,  the  numerator 
polynomial  counterparts  of  Eq.  9.33  and  9.35  are  expressed  in  terms  of  complex  polyno- 
mial functions  of  the  reward  function  values,  and  consequently  it  may  be  that  no  general 
case  three-operator  counterpart  of  Eq.  9.36  exists.  (It  is  noted  that  the  design  of  the 
reward  functions  employed  in  Section  5,  in  which  only  length  0-2  schema  dependence  is 
incorporated,  tends  to  minimize  crossover  disruption,  which  may  account  for  the  progres- 
sion toward  optimality  indicated  by  the  three-operator  results  recorded  in  Figures  5-7 
through  5-18).  The  simulated  annealing  global  optimality  may  thus  extrapolate  onto  the 
simple  genetic  algorithm  only  in  the  Pc  ^ 0 and  M  -><» limiting  sense. 
9.5  Extending  the  Stationary  Distribution  Representation 

Eq.  9.26  and  9.27  represent  an  exact  expression  of  the  value  of  the  determinant 
-|  Pji  - 1|  =  I  P  - 1|  - 1  P-  - 1| ,  and  with  Propositions  6.7  and  6.8  constitute  an  exact  repre- 
sentation of  the  components  of  the  stationary  distribution  of  the  two  and  three-operator 
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algorithms.  Section  9.4  extends  those  results  to  the  determinants  _|  p_  '  _  i'  I  - 
I  PCm^)'  -  I'l  -  |Ps^'  - 1'  I  whose  a  ->  0*  values  are  required  for  use  in  Propositions  7.5-6. 
The  utility  of  these  representations  depends  upon  the  ability  to  extract  useful  relationships 
between  the  C'(ni,r)'s  from  the  general  form  represented  by  Eq.  9.27  and  Eq.  9.28.  The 
following  paragraphs  examine  the  combinatorial  determinants  in  the  general  forms  pro- 
vided by  Eq.  9.27  and  Eq.  9.28  and  deduce  some  of  the  key  relationships.  The  purpose  of 
this  effort  is  to  provide  a  foundation  for  extending  the  stationary  distribution 
representation  methodology  developed  in  Sections  9.2-9.4. 

First,  if  the  enabling  condition  for  Eq.  9.22  is  satisfied  (i.e.  M  <  N),  then  every  ele- 
ment in  the  combinatorial  determinant  of  Eq.  9.27  is  either  zero  or  it  is  the  combinatorial 
determinant  corresponding  to  the  order  zero  coefficient  for  some  state  in  S'.  Thus,  every 
coefficient  of  the  form  represented  by  Eq.  9.27  can  be  written  as  sums  and  products  of 
order  zero  coefficients.  An  analogous  conclusion  applies  to  Eq.  9.28. 

Second,  it  is  clear  from  Eq.  9.27  that  nonzero  order  differentiation  of  any  two  or 
more  rows  of  -|  P5  - 1|  =  |  P  - 1|  - 1  P^  - 1|  in  an  identical  pattern  (e.g.  f  :0^t^^  =  Tj^  for 
n,  ^  nz)  introduces  identical  rows  into  the  combinatorial  determinant,  and  thus  makes 
C'(ni,r')  =  0.  Consequently,  no  monomial  terms  corresponding  to  any  r'  with  identical 
nonzero  rows  survive  in  the  expansion  of-|  P^i- 1|  =  |  P- 1|  - 1  P^- 1| .  An  identical  con- 
clusion applies  to  the  coefficients  of-|PHi^'-I'|  =  |  P(mA)' -I'|  -  |Psa'~^'|'  of  which  Eq. 
9.28  is  an  exemplar. 

A  very  important  class  of  coefficient  identities  derives  from  transpositions  of  non- 
zero rows  and  columns  of  the  differentiation  order  array.  The  resulting  identities  are  very 
closely  connected  to  the  algebra  of  symmetric  and  alternating  polynomials,  and  to  an 
associated  determinant  concept  called  alternants,  of  which  Vandermonde  determinants 
are  a  special  case.  Appendix  C  is  provided  to  support  the  following  paragraphs. 

From  the  form  of  the  combinatorial  determinants  in  Eq.  9.27  and  Eq.  9.28,  it  is  clear 
that  exchanging  any  two  of  the  k  rows  indexed  by  row  indices  n  e  K  is  equivalent  to 
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exchanging  the  corresponding  nonzero  rows  of?-  If?'  is  derived  from  r  by  such  a  row 
transposition,  then  it  follows  that 

C'(m,?')  =  -C'(m,?). 
Thus,  C'(m,r)  establishes  the  value  (to  within  a  sign  alternation)  of  the  coefficients  of  k! 
distinct  monomial  terms  in  the  expansion  of-|  P^-II  =  |  P-I|  -|  P^-II .  An  identical 
result  applies  toC'o(m,r)  and  the  expansion  of -|Ps^'-I'|  =  IPCmJ' -I'|  -  |P5;^'-r|. 

The  collection  of  monomial  terms  corresponding  to  this  coefficient  identity  can  be 
written  as  the  product  of  C'(ni,r)  (or  of  C'o(m,r))  and  a  polynomial  function  of  the  fonn 
defined  in  Eq.  C.12  of  Appendix  C.  That  is,  the  collection  of  terms  is  a  quasi-alternating 
polynomial  function  in  the  array  of  variables  (2^P(i  |  n)  -  1). 

In  addition  to  the  preceding  result,  the  following  identity  applies  to  C'(m,r)  and  the 
expansion  of  -|  P^;  - 1|  =  |  P  - 1|  - 1 P^  - 1| .  For  any  n  e  K,  transposition  of  columns  m  and 
n  in  the  combinatorial  determinant  of  Eq.  9.27  is  equivalent  to  representing  the  value  of 
C'(n,r')  where  r'  is  derived  from  r  by  exchanging  rn  with  r^  =  0.  That  is 

ne  K=>C'(n,r')  =  -C'(m,r). 
Thus,  the  identical  quasi-alternating  function,  evaluated  in  the  new  set  of  variables  gener- 
ated by  replacing  each  P(i  |  n)  with  the  corresponding  P(i  |  m),  is  included  in  the  expan- 
sion of-|  P5-II  =  I P- 1|  - 1 P5-II .  Collectively,  these  results  account  for  (k-i- 1)!  of  the 
coefficients  required  for  representation  of  the  stationary  distribution. 

Another  class  of  coefficient  identities  derives  from  transpositions  of  the  columns  of 
r  (i.e.  transpositions  of  i,j  g  S).  Let  m'  be  derived  from  m  by  setting  m(j)'  =  m(i), 
m(i)'  =  m(j),  n,'  from  n,  by  setting  n,(j)' =  n,(i),  n,(i)'  =  n,(j),  etc.  Then,  if  r'  is  derived 
from  r  by  transposition  of  rows  m  with  m',  n,  with  n/,  etc.  followed  by  transposition  of 
columns  i  and  j,  it  follows  from  Eq.  9.27  that 

C'(m',?')  =  C'(m,?). 
An  identical  result  applies  to  C'o(mA',r'). 
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The  number  of  distinct  coefficients  whose  values  are  generated  in  this  fashion  from 
C'(m,r)  depends  upon  both  the  number  and  form  of  the  nonzero  columns  in  r.  If  the  num- 
ber of  nonzero  columns  is  p,  then  exchanging  any  of  the  p  nonzero  columns  with  any  of 
the  N  -  p  zero  columns  generates  the  coefficient  of  a  distinct  monomial  term.  Exchanging 
a  nonzero  column  with  another  nonzero  column  having  a  different  column  sum  also  gen- 
erates a  distinct  coefficient.  However,  exchanging  a  nonzero  column  with  another  non- 
zero column  having  identical  column  sum  may  or  may  not  generate  a  distinct  coefficient, 
depending  upon  the  distribution  of  the  nonzero  entries  in  the  two  columns,  because  it  is 
possible  for  the  transformation  described  above  to  translate  one  column  into  the  other.  A 
lower  bound  on  the  number  of  distinct  coefficients  thus  generated  is 

The  collection  of  monomial  terms  corresponding  to  this  coefficient  identity  can  be 
written  as  the  product  of  C'(m,r)  (or  of  C'o(m,r))  and  a  polynomial  function  of  the  form 
defined  in  Eq.  CIO  of  Appendix  C.  That  is,  the  collection  of  terms  is  a  quasi-symmetric 
polynomial  function  in  the  array  of  variables  (2^P(i  |  n)  -  1). 

These  coefficient  identities  and  their  connection  to  the  quasi-symmetric  and  quasi- 
alternating  polynomials  of  Appendix  C  offer  a  promising  mechanism  for  extending  the 
stationary  distribution  representation  work  begun  here.  Examination  of  the  general  form 
(2^P(i  I  n)  -  1)  reveals  that  it  is  zero  mean  in  the  sense  that 

I(2'P(i|n)-l)  =  0. 

ie  S 

This  property,  along  with  the  common  form  of  the  elements  in  the  conditional  probability 
array  [P(i  |  n)],  suggests  that  the  symmetric  and  alternating  polynomial  forms  required  for 
evaluation  of  Propositions  6.7-6.8  or  6.5-6.6  may  admit  to  large  scale  simplifications,  and 
ultimately  yield  a  tractable,  explicit  closed  form  expression  for  the  stationary  distribution 
components. 


SECTION  10 

CONCLUSIONS  AND  FUTURE  DIRECTION 

10.1  Summary 

This  dissertation  reports  an  effort  to  establish  an  analytical  framework  for  the  sim- 
ple genetic  algorithm,  based  upon  the  asymptotic  probability  distribution  of  the  generated 
solution  sequences.  The  mechanism  employed  herein  is  extrapolation  of  the  extensive 
existing  theoretical  foundation  of  the  simulated  annealing  algorithm  onto  the  genetic 
algorithm.  That  foundation  is  based  upon  the  asymptotic  behavior  of  a  nonstationary 
Markov  chain  simulated  annealing  algorithm  model.  The  simulated  annealing  literature  is 
reviewed  in  Section  2,  with  particular  emphasis  on  the  methodology  employed  to  develop 
the  key  theoretical  results.  Those  results  include  a  demonstration  that  provided  a  lower 
bound  of  the  form  K/log(k)  on  the  algorithm  parameter  corresponding  to  absolute  temper- 
ature is  observed,  the  asymptotic  probability  distribution  over  the  algorithm  state  space  is 
zero  for  all  states  corresponding  to  sub-optimal  solutions.  Thus,  the  simulated  annealing 
algorithm  obtains  (asymptotically)  a  globally  optimal  soludon. 

The  genetic  algorithm  literature  is  reviewed  in  Section  3.  The  significant  conclusion 
of  that  section  is  that  while  certain  important  theoretical  results  exist,  notably  the  so 
called  schema  theorem  and  some  work  on  a  problem  construct  referred  to  as  the  minimal 
deceptive  problem,  no  genetic  algorithm  model  or  accompanying  convergence  theory 
comparable  in  scope  to  that  of  simulated  annealing  exists  in  the  literature.  The  fundamen- 
tal purpose  of  the  work  described  herein  is  to  provide  such  an  analytical  framework  by 
extrapolating  the  known  simulated  annealing  theory  onto  the  genetic  algorithm. 

An  essential  first  step  toward  that  goal  is  development  of  a  nonstationary  Markov 
chain  algorithm  model  for  the  genetic  algorithm.  That  task  is  accomplished  in  Section  4.  . 
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The  product  of  that  effort  is  a  very  general  nonstationary  Markov  chain  model  for  varia- 
nts of  the  algorithm  incorporating  combinations  of  the  three  fundamental  genetic  algo- 
rithm operators.  The  model  is  tailored  to  resemble  the  model  employed  in  the  analysis  of 
the  simulated  annealing  algorithm  convergence  behavior,  with  the  mutation  probability 
algorithm  parameter  playing  a  role  analogous  to  absolute  temperature  in  simulated 
annealing. 

Additionally,  some  salient  features  of  the  model  state  behavior  are  pointed  out  in 
Section  4.  In  particular,  the  one-operator  (reproduction  only)  simple  genetic  algorithm  is 
shown  to  possess  exactly  2^  absorbing  states,  one  for  each  possible  uniform  population, 
while  the  two-operator  (reproduction/mutation)  and  three-operator  (reproduction/muta- 
tion/crossover) variants  possess  a  unique  stationary  distribution.  The  expected  value  of 
the  absorption  time  for  the  one-operator  algorithm  is  finite  and  an  upper  bound  is  pro- 
vided by  Eq.  4.8.  The  probability  distribution  of  the  final  solution  state  produced  by  the 
one-operator  simple  genetic  algorithm  depends  upon  the  initial  state,  mo. 

The  inclusion  of  the  mutation  operator  is  shown  in  Section  4  to  provide  a  significant 
additional  dimension  to  the  state  behavior  of  the  time-homogeneous  (stationary)  two  and 
three-operator  variants  of  the  algorithm,  the  existence  of  a  unique  stationary  distribution. 
The  significance  of  the  unique  stationary  distribution  is  that  the  asymptotic  state  behavior 
is  independent  of  the  starting  state.  It  is  completely  determined  by  the  objective  function 
and  the  algorithm  parameters. 

In  Section  5,  the  genetic  algorithm  model  is  employed  to  generate  some  computer 
simulation  results.  Specifically,  a  combinatorial  interpretation  of  the  model  state  space  is 
explored  numerically  in  Section  5.2  and  the  limiting  stationary  distribution  of  the  three 
operator  algorithm  is  approximated  for  a  variety  of  algorithm  parameter  sets  in  Section 
5.5.  A  very  significant  feature  of  the  limiting  stationary  distribution  is  suggested  by  the 
Section  5.5  results  and  later  verified  theoretically  (in  Section  7).  It  is  that  the  limiting  two 
and  three-operator  algorithm  stationary  distribution  behavior  necessary  for  extrapolating 
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the  simulated  annealing  asymptotic  global  optimality  result  does  not  follow.  The  limiting 
distribution  is  nonzero  for  all  states  corresponding  to  uniform  populations  (one-operator 
absorbing  states),  including  those  representing  sub-optimal  solutions.  This  complication 
precludes  an  exact  extrapolation  of  the  simulated  annealing  convergence  theory  onto  the 
simple  genetic  algorithm.  The  Section  5  results  do  however  reinforce  the  intuitive  notion 
that  increasing  the  algorithm  population  size  parameter  biases  the  limiting  distribution 
towards  the  desired  limiting  behavior. 

Section  6  employs  the  Perron-Frobenius  Theorem  (which  is  summarized  in  Appen- 
dix B)  to  formulate  the  time-homogeneous  two  and  three-operator  algorithm  unique  sta- 
tionary distribution  existence  argument  into  a  system  of  equations  whose  solution  is  the 
stationary  distribution  components.  The  solution  is  formulated  in  terms  of  Cramer's  Rule, 
and  is  not  explicitly  solved,  however  the  Section  6  results  provide  a  remarkable  degree  of 
insight  into  the  form  of  the  solution  and  its  behavior  with  respect  to  the  algorithm  param- 
eters. Those  results  provide  the  foundation  for  the  remaining  sections. 

The  unique  stationary  distribution  existence  argument  for  the  stationary  two  and 
three-operator  algorithm  variants  only  applies  when  the  mutation  probability  parameter  is 
stricdy  greater  than  zero.  A  one-operator  (zero  mutation  probability)  stationary  distribu- 
tion exists  but  as  demonstrated  in  Section  4.3.1  it  is  not  unique.  A  very  important  require- 
ment for  extrapolation  of  the  simulated  annealing  convergence  theory  onto  the  simple 
genetic  algorithm  is  existence  of  a  zero  mutation  probability  limit  for  the  stationary 
distribution.  Section  7  is  devoted  to  resolving  that  question  affirmatively.  It  is  based  upon 
the  results  developed  in  Section  6  and  it  also  verifies  the  Section  5.5  observation  concern- 
ing the  nonzero  limit  for  all  states  corresponding  to  uniform  populations. 

A  very  significant  theoretical  contribution  of  this  work  is  developed  in  Section  8.  It 
is  a  monotonic  mutation  probability  bound  sufficient  to  guarantee  strong  ergodicity  of  the 
nonstationary  two  and  three-operator  simple  genetic  algorithm  Markov  chains.  The 
parameter  bound  is  analogous  to  the  simulated  annealing  temperature  schedule  bound. 
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The  bound  is  asserted  in  Proposition  8.1,  and  its  form  (i.e.  j^-r)  is  asymptotically  superior 
to  the  K/log(ic)  bound  associated  with  the  simulated  annealing  algorithm.  It  is  very  note- 
worthy that  the  same  bound  applies  both  to  the  two  and  three-operator  algorithm  variants. 
At  least  in  terms  of  the  Section  8  bound,  the  crossover  operator  does  not  expedite 
convergence. 

All  of  the  results  developed  in  Sections  7  and  8  are  obtained  without  explicitly  solv- 
ing the  stationary  distribution  system.  Section  9  attacks  the  problem  of  explicit  solution. 
It  is  a  very  extensive  and  somewhat  tedious  development.  The  product  of  that  work  is  an 
expression  for  the  general  term  in  a  multivariate  Taylor's  series  expansion  of  the  determi- 
nant form  required  for  explicit  solution  of  the  stationary  distribution  equations.  The 
results  are  expressed  in  Eq.  9.26  and  9.27  for  the  general  nonzero  mutation  probability 
case,  augmented  by  Eq.  9.28  for  the  zero  mutation  probability  limit.  These  results  stop 
short  of  a  useable  answer  but  they  do  provide  some  insight  into  the  nature  of  the  solution. 
Further,  Section  9.5  provides  some  intriguing  ideas  for  extending  the  work  started  in  Sec- 
tion 9. 

The  attempt  to  extrapolate  the  simulated  annealing  convergence  theory  onto  the 
genetic  algorithm  fails  in  the  sense  that  the  zero  mutation  probability  stationary  distribu- 
tion limits  of  the  two  and  three-operator  simple  genetic  algorithm  variants  do  not  satisfy 
the  required  form  for  extrapolation  of  the  simulated  annealing  global  optimality  result. 
However,  evidence  is  provided  which  suggests  that  for  the  two-operator  algorithm  vari- 
ant, the  required  behavior  can  be  approached  by  increasing  the  population  size  parameter 
(Eq.  9.36).  The  question  is  more  complicated  for  the  three-operator  case,  and  as  pointed 
out  in  Section  9.4,  implementation  of  crossover  with  nonzero  p^  may  indeed  preclude 
convergence  to  global  optimality  even  in  the  infinite  population  size  limiting  sense  of  Eq. 
9.36. 

The  latter  observation  concerning  crossover,  along  with  the  equivalence  of  the 
mutation  probability  sequence  bounds  for  the  two  and  three-operator  cases  noted 
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previously,  poses  some  significant  questions  concerning  the  role  of  the  crossover  opera- 
tor. Indeed,  from  the  results  developed  herein,  it  is  not  clear  that  any  desirable  effect  on 
the  asymptotic  algorithm  behavior  obtains  from  application  of  the  crossover  operator, 
though  it  may  have  a  desirable  effect  in  expediting  convergence  in  real  (finite  time)  appli- 
cations. The  resolution  of  these  questions,  along  with  a  host  of  other  applications  ques- 
tions such  as  optimum  population  size,  mutation  and  crossover  probability  parameter 
selection,  number  of  iterations  required  to  achieve  acceptable  results,  etc.  require  further 
progress  on  the  stationary  distribution  representation  task  begun  in  Section  9. 

10.2  Contributions  of  the  Research 
The  research  reported  herein  establishes  a  framework  for  modeling  the  genetic  algo- 
rithm in  terms  of  the  asymptotic  probability  distribution  of  the  solution  sequences  which 
it  produces.  Specific  significant  accomplishments  include  the  following: 

(1)  A  very  general  nonstationary  Markov  Chain  model  of  one,  two  and  three- 
operator  variants  of  the  genetic  algorithm,  and  a  framework  for  analysis  of 
the  operators  based  upon  their  impact  on  the  state  space  of  the  Markov  chain 

(2)  Demonstration  of  the  existence  of  a  unique  stationary  distribution  for  the 
time-homogeneous  (stationary)  two  and  three-operator  algorithm  variants 

(3)  A  stationary  distribution  solution  in  terms  of  the  characteristic  polynomials 
of  matrices  derived  from  the  state  transition  matrix 

(4)  Demonstration  of  the  existence  of  a  zero  mutation  probability  stationary  dis- 
tribution limit  for  the  time-homogeneous  two  and  three-operator  algorithms 

(5)  A  mutation  probability  schedule  bound  (analogous  to  the  annealing  schedule 
bound  of  simulated  annealing)  sufficient  for  the  nonstationary  two  and  three- 
operator  genetic  algorithm  variants  to  achieve  the  limiting  distribution 
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(6)  A  methodology  for  representing  the  two  and  three-operator  stationary  distri- 
bution components  at  all  consistent  values  of  mutation  probability  (including 
the  zero  mutation  probability  limit),  and  a  proposed  approach  for  extending 
that  methodology  to  produce  an  explicit  result. 

10.3  Future  Direction 
In  order  to  achieve  the  stated  goal  of  this  work,  a  complete  analytical  framework  for 
the  simple  genetic  algorithm,  additional  progress  must  be  made  on  the  stationary  distribu- 
tion solution  effort  begun  in  Section  9.  The  coefficient  relationships  noted  in  Section  9.5, 
especially  the  coefficient  identities  which  attend  transpositions  of  rows  and  columns  in 
the  differentiation  order  array  and  their  connection  with  the  quasi-symmetric  and  quasi- 
alternating  polynomial  notions  presented  in  Appendix  C,  provide  a  foundation  for  pro- 
ceeding with  this  effort.  An  explicit  representation  of  the  functional  form  of  the  stationary 
distribution,  reduced  to  a  rational  function  expression  in  the  algorithm  parameters  and 
objective  function,  would  provide  a  very  valuable  theoretical  tool  for  use  in  the  analysis 
of  genetic  algorithm  performance,  and  is  the  ultimate  goal.  However,  even  if  explicit 
solution  is  not  attainable,  it  may  prove  possible  to  deduce  very  useful  bounds  on  the  sta- 
tionary distribution  components  from  continuation  of  the  Section  9  development. 

A  second  promising  area  for  continuation  of  this  work  concerns  the  mutation 
probability  parameter  sequence  bound  provided  in  Section  8.  It  is  based  on  very  simple 
lower  bounds  (Eq.  4.21  and  4.31)  which  exist  for  the  conditional  probabilities  which 
compose  the  state  transition  matrix,  and  it  only  employs  the  one-step  transition  matrix  in 
the  (1  -x,(P))  sequence  employed  to  establish  weak  ergodicity  (Section  8.2).  Some  pre- 
liminary work  not  reported  in  Section  8  suggests  that  employing  two-step  transition 
matrices  in  summing  the  (1  -  x,(P))  sequence  may  allow  a  refinement  of  the  bound  to 
something  of  the  form  k~'.  It  also  appears  from  that  preliminary  work  that  the  same  bound 
applies  for  both  the  two  and  three-operator  algorithm  variants. 


APPENDIX  A 
DISCRETE  TIME  FINITE  STATE  MARKOV  CHAINS 

A.l  Introduction 

The  following  paragraphs  establish  some  definitions  and  theorems  on  discrete  time 
finite  state  Markov  chains  and  related  stochastic  matrix  concepts.  These  results  fall  into 
three  main  categories,  (1)  elementary  definitions,  (2)  definitions  and  theorems  concerning 
the  state  space  and  asymptotic  behavior  of  time-homogeneous  (stationary)  Markov  chains 
and  (3)  some  more  advanced  ergodicity  definitions  and  theorems  necessary  for  the  analy- 
sis of  the  asymptotic  behavior  of  inhomogeneous  Markov  chains.  These  results  are  pres- 
ented without  proof  or  elaboration  but  the  foundation  required  for  the  more  elementary  of 
them  can  be  obtained  from  [Cinl75,  IsMa76]  or  many  other  references  on  Markov  chains. 
The  ergodicity  related  results  can  be  found  in  [IsMa76,  Sene81]. 

Although  some  of  the  results  discussed  here  apply  to  continuous  time  and/or 
denumerably  infinite  state  space  Markov  chains  as  well,  the  intention  is  to  restrict  consid- 
eration to  the  discrete  time  finite  state  case.  All  references  herein  to  Markov  chains  are 
understood  to  mean  discrete  time  finite  state  Markov  chains. 

In  the  following,  let  K  =  {0, 1 , 2,  •  •  • }  be  the  set  of  nonnegative  integers,  let 
X  -  {Xijik  e  K}  be  a  discrete  time  (i.e.  discrete  sequence  index)  stochastic  process  with 
finite  cardinality  state  space  E,  and  let  i,j  e  E. 

A. 2  Elementary  Definitions 
Definition  Al:  If  Vi,j  €  E  and  every  k  £  K,  it  follows  that 

Pr{X,,,=j:Xo  =  i„X,  =  i„-,X,  =  i}  =  Pr{X,,,=j:X,  =  i}, 

then  X  is  a  Markov  chain. 
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Definition  A2:  Any  row  vector  q^  =  [q(i)]  ,i  e  E  satisfying  the  conditions 

(1)  VieE:q(i)>0 

(2)  ^=  Zq(i)=l 

ie  E 

is  called  a  probability  vector. 

Definition  A3:  Any  square  matrix  P  whose  rows  are  all  composed  of  probability  vectors 
is  called  a  stochastic  matrix,  or  sometimes  more  explicitly  a  row  stochastic  matrix.  The 
row  sum  constraint  on  a  row  stochastic  matrix  can  be  written  as  PI  =  1,  so  1  is  an  eigen- 
value of  every  stochastic  matrix. 

Definition  A4:  The  stochastic  matrix 

Pk=[P.(i.J)]  =  [Pr{X,..=j|X,  =  i}] 

is  the  one  step  transition  probability  matrix  or  state  transition  matrix  of  the  Markov  chain 
X.  If  the  probability  vectors  q^  and  q^^.!  are  respectively  the  probability  distributions  of 
X,(  and  X|j+i,  then 

qI+i  =  qi!Pk- 

Similarly,  the  stochastic  matrix 

P.,  =  fP..(iJ)l  =  |Pr{X,=j|X„  =  i}l  =  pJ^„...p,.,=  nP, 

l  =  in 

where  k  =  m  +  n,  m,  n  e  K,  n  >  0  is  the  n-step  transition  probability  matrix  of  X,  and 

_  k-l_ 

qi!^=qlPn,k  =  qIn  p,. 

I  =  m 

Definition  A5:  Let  P,;  be  the  state  transition  matrix  of  the  Markov  chain  X  at  time  (se- 
quence index)  k.  Then,  X  is  time-homogeneous  if  and  only  if  Vk  €  K  it  follows  that 
P,j  =  P  where  Pisa  constant  state  transition  matrix. 
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A. 3  Time-Homogeneous  Markov  Chains 
The  time-homogeneous  Markov  chain  X  is  completely  specified  by  its  initial  proba- 
bility distribution,  qo,  and  state  transition  matrix,  P.  The  probability  distribution  of 
X|^  ,k>  1  is  given  by 

The  following  definitions  and  theorems  concern  the  asymptotic  behavior  of  the  chain  and 
some  conditions  on  the  state  space  which  make  the  asymptotic  behavior  independent  of 
qo-  In  the  following,  let  the  i,j  e  E  element  of  P''  be  denoted  by  p^'(i,j). 

Definition  A6:  A  subset  Eq  of  the  state  space  E  of  the  Markov  chain  X  is  called  closed  if 
Vi  e  Eq  ,  Vj  €  E  -  Eo  it  follows  that  p(i,  j)  =  0.  If  the  closed  set  Eq  contains  the  single  state 
i,  so  that  p(i,i)  =  1,  then  the  state  i  is  called  an  absorbing  state. 

Definition  A7:  A  Markov  chain  is  called  irreducible  if  there  exists  no  nonempty  closed 
subset  of  its  state  space  E  other  than  E  itself. 

Definition  A8:  The  states  i  and  j  are  said  to  intercommunicate  if 

3ki,kj  e  K  9  p^*(i,j)  >  0  and  p^'^Q,  i)  >  0. 

Theorem  A 1 :  A  Markov  chain  is  irreducible  if  and  only  if  all  pairs  of  states  intercommu- 
nicate. 

Definition  A9:  State  i  g  E  of  the  Markov  chain  X  has  period  d  if  the  following  two  condi- 
tions hold: 

(1)      p^\i,i)  =  0  unless  k  =  md  for  some  positive  integer  m  and 
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(2)       d  is  the  largest  integer  with  property  (1). 
If  d  =  1,  state  i  is  called  aperiodic.  The  Markov  chain  X  is  aperiodic  if  and  only  if  Vi  e  E 
are  aperiodic. 

Theorem  A2:  If  X  is  irreducible  and  if  3i  e  E  9  p(i,i)  >  0,  then  X  is  aperiodic. 

Definition  A 10:  Any  probability  vector  q  over  the  state  space  of  the  time  homogeneous 
Markov  chain  X  and  satisfying 

is  called  a  stationary  distribution  of  X.  It  is  not  necessarily  unique. 

Theorem  A3:  If  the  Markov  chain  X  is  time-homogeneous,  irreducible,  aperiodic  and  has 
a  finite  state  space,  then  a  stationary  distribution  exists  for  X.  Funher,  the  stationary  dis- 
tribution is  unique  and  is  determined  by 

and 

Theorem  A4:  If  the  time-homogeneous  Markov  chain  X  possesses  a  unique  stationary 
distribution,  q,  then  for  every  probability  vector  x  with  compatible  dimensions,  it  follows 
that 

lim  X  P  =  q^. 
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A. 4  Inhomogeneous  Markov  Chains 
Complete  specification  of  the  inhomogeneous  Markov  chain  X  requires  its  initial 
probability  distribution,  Qq,  and  the  infinite  sequence  of  state  transition  matrices, 
{P^}  ,  k  >  0.  The  probability  distribution  of  X^  ,k  >  1  is  given  by 

q^  =  q^nP„. 

n  =  0 

If  the  chain  is  asymptotically  independent  of  qo,  then  it  is  said  to  be  ergodic.  Two  classes 
of  ergodicity  must  be  distinguished.  The  following  definitions  and  theorems  elaborate. 

Definition  All:  The  inhomogeneous  Markov  chain  X  is  weakly  ergodic  if 
Vi,j,l  €  E,Vm  €  K 

lim(Pmk(i.l)-PmkG>l))  =  0. 

Weak  ergodicity  does  not  require  that  either  lim  Pmk(i,l)  or  lim  PmkO.l)  exist. 

Definition  A 12:  Any  scalar  function  t(),  continuous  on  the  set  of  nxn  stochastic  matrices 
P  and  satisfying  0  <  t(P)  <  1  is  called  a  coefficient  of  ergodicity.  If  in  addition 

T(P)  =  0<^P  =  Tq^ 
where  q  is  any  probability  vector  with  compatible  dimensions  (i.e.  when  all  rows  of  P  are 
identical  probability  vectors),  then  T  is  said  to  be  proper.  Weak  ergodicity  is  equivalent  to 

x(P^)^0,k^oo,m>0 
where  T  is  a  proper  coefficient  of  ergodicity. 

Theorem  A5:  Let  P  be  a  nxn  stochastic  matrix  and  let 

n 

T,(P)=  1  -  min  I  min(p(i,k),p(j,k)). 

i.j     k=l 

Then,  x^{P)  is  a  proper  coefficient  of  ergodicity. 
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Theorem  A6:  The  inhomogeneous  Markov  chain  X  is  weakly  ergodic  if  and  only  if  there 
exists  a  strictly  increasing  sequence  of  positive  numbers  {k,},  1  €  K  such  that 


,li>-<p«.,)]=' 


Definition  A 13:  The  inhomogeneous  Markov  chain  X  is  strongly  ergodic  if  there  exists  a 
probability  vector  q  satisfying 

Vi,j  €  E,  Vm  e  K  :  lim  p^(i,j)  =  qO)- 

Thus,  strong  ergodicity  implies  convergence  in  distribution.  The  unique  vector  q  is  analo- 
gous to  its  time-homogeneous  chain  counterpart  in  Theorem  A4. 

Theorem  A7:  The  inhomogeneous  Markov  chain  X  is  strongly  ergodic  if  it  is  weakly 
ergodic  and  if  for  each  transition  matrix,  P;;,  of  X  there  exists  a  left  eigenvector  q^,  corre- 
sponding to  eigenvalue  1,  q^^  is  a  probability  vector,  and 

k  =  OieE 

Further,  if  q  =  lim  q^,  then  q  is  the  unique  vector  in  Definition  A 1 3. 

k  -»>» 


APPENDIX  B 
THE  PERRON-FROBENIUS  THEOREM  AND  STOCHASTIC  MATRICES 

B.l  Introduction 

A  matrix  possessing  the  property  that  all  of  its  components  are  nonnegative  is 
referred  to  as  a  nonnegative  matrix.  For  the  matrix  T,  this  condition  is  indicated  by  T  >  0. 
The  case  in  which  all  components  of  T  are  strictly  positive  is  indicated  by  T  >  0.  This 
notation  extends  in  the  obvious  manner  to  expressions  such  asT>B<=>T-B>0  relating 
nonnegative  matrices  with  compatible  dimensions. 

The  definitions,  theorems  and  corollary  in  Section  B.2  below  concern  nonnegative 
matrices.  They  are  the  foundation  for  the  Markov  chain  stationary  distribution  existence 
and  representation  theorem  and  related  results  summarized  in  Appendix  A  and  employed 
in  Sections  2,  4,  7  and  8.  They  are  extracted  from  [SeneSl],  and  are  specialized  in  Section 
B.3  from  the  case  of  finite  nonnegative  matrices  to  the  case  of  finite  stochastic  matrices. 
They  are  employed  extensively  in  Sections  6  and  7. 

B.2  The  Perron-Frobenius  Theorem  and  Ancillary  Results  for  Primitive  Matrices 

Theorem  B2  below  is  called  the  strong  version  of  the  Perron-Frobenius  theorem.  It 
applies  to  a  class  of  nonnegative  matrices  referred  to  as  primitive.  A  version  of  the  theo- 
rem which  applies  to  the  wider  class  of  irreducible  nonnegative  matrices  is  usually 
invoked  for  applications  involving  stochastic  matrices,  but  the  flexibility  of  the  more 
general  version  is  not  required  for  the  purposes  herein.  The  connection  of  these  results  to 
those  of  Appendix  A  is  provided  by  Theorem  Bl.  It  asserts  that  primitivity  (Definition 
Bl)  is  equivalent  to  the  combination  of  irreducibility  and  aperiodicity,  as  defined  in 
Appendix  A. 
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Definition  Bl:  A  square  nonnegative  matrix,  7,  is  primitive  if  there  exists  a  positive  inte- 
ger k  such  that  T*  >  0. 

Theorem  B 1 :  If  the  n  x  n  nonnegative  matrix  T  is  irreducible  (Definition  A7)  and  aper- 
iodic (Definition  A9),  then  T  is  primitive  and  conversely. 

Theorem  B2:  Let  T  be  an  n  x  n  nonnegative  primitive  matrix.  Then  there  exists  an  eigen- 
value r  of  T  such  that 

(a)  r  is  real,  r  >  0 

(b)  r  has  corresponding  left  and  right  eigenvectors  with  strictly  positive 
components 

(c)  r  >  I  ^1  for  any  eigenvalue  X^t 

(d)  the  eigenvectors  associated  with  r  are  unique  to  constant  multiples 

(e)  If  0  <  B  <  T  and  P  is  an  eigenvalue  of  B,  then  |  P|  <  r.  Moreover,  |  Pl  =  r 
implies  B  =  T. 

(0      r  is  a  simple  root  of  the  characteristic  polynomial  of  T. 

Definition  B2:  The  eigenvalue  r  asserted  in  Theorem  B2  is  called  the  Perron-Frobenius 
eigenvalue  of  the  nonnegative  primitive  matrix  T. 

Corollary  Bl:  Let  Tj j  be  the  components  of  a  nonnegative  primitive  matrix  T  having 
Perron-Frobenius  eigenvalue  r.  Then 

min  X  Tj  J  <  r  <  max  X  T^  j 

1         j         ■  i  j         ' 

with  equality  on  either  side  implying  equality  throughout  (i.e.  r  can  only  be  equal  to  the 
maximal  or  minimal  row  sum  if  all  row  sums  are  equal).  A  similar  proposition  holds  for 
column  sums. 
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Theorem  B3:  Let  j  be  an  nxn  nonnegative  primitive  matrix  with  Perron-Frobenius  eigen- 
value r,  let  V  and  w  be  strictly  positive  left  and  right  eigenvectors  respectively  of  T 
corresponding  to  r  with  v  and  w  normed  so  that  v'w  =  1,  and  let  the  t  <  n  distinct 
eigenvalues  of  T  be  ordered  such  that  r  >  |  ^|  >  |  A^|  >  •  •  •  >  1  ^|  with  the  additional  condi- 
tion that  I  A^l  has  multiplicity  mj  equal  to  or  greater  than  the  multiplicity  of  any  other 
eigenvalue  X^  for  which  |  A^|  =  |  X^l .  It  follows  that 

(a)  if  ^2  ?t  0,  then  as  k  — >  «> 

elementwise,  where  s  =  mj  -  1 ; 

(b)  ifA2  =  0,thenfork>n-l 

T  =  r''wv  . 

B.3  The  Perron-Frobenius  Theory  for  Stochastic  Matrices 
A  stochastic  matrix  (e.g.  the  state  transition  matrix  of  a  Markov  chain)  is  a  special 
case  of  a  square  nonnegative  matrix  in  which  all  row  sums  are  equal  to  the  constant  1. 
The  following  results  specialize  those  of  Section  B.2  to  the  case  of  T  an  nxn  stochastic 
primitive  matrix,  P. 

Theorem  B4:  Let  P  be  an  nxn  stochastic  primitive  matrix.  Then 

(a)  r  =  1  is  an  eigenvalue  of  P 

(b)  r  =  1  has  corresponding  left  and  right  eigenvectors  with  strictly  positive 
components 

(c)  r  =  1  >  I  A,|  for  any  eigenvalue  X^t 

(d)  the  eigenvectors  associated  with  r  =  1  are  unique  to  constant  multiples 
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(e)  If  0  <  B  <  P  and  (3  is  an  eigenvalue  of  b,  then  |  p|  <  r  =  1 .  Moreover, 
I  PI  =r=l  implies  B  =  P. 

(f)  r  =  1  is  a  simple  root  of  the  characteristic  polynomial  of  P. 

This  theorem  follows  immediately  from  Theorem  32  by  application  of  Corollary 
Bl  with  T  a  stochastic  primitive  matrix,  P.  Among  its  consequences  are  the  following. 

Proposition  Bl:  The  right  eigenvector  asserted  in  Theorem  B4(b)  and  (d)  can  be  selected 
as  the  vector  1 . 

This  result  follows  from  the  row  sum  constraint  on  nxn  stochastic  matrices,  which 
can  be  expressed  as  PI  =  1.  Thus,  1  is  a  right  eigenvector,  corresponding  to  eigenvalue  1, 
of  every  nxn  stochastic  matrix.  Theorem  B4  asserts  that  for  finite  primitive  stochastic 
matrices,  it  is  unique  to  within  a  nonzero  scalar  multiple. 

Proposition  B2:  Let  the  vector  q  be  the  left  eigenvector  asserted  in  Theorem  B4(b)  and 
(d).  Then,  the  additional  constraint  qT  =  1  is  consistent  and  makes  q  unique. 

Since  the  left  eigenvector  asserted  in  Theorem  B4(b)  and  (d)  has  strictly  positive 
components,  its  inner  product  with  the  vector  1  is  a  strictly  positive  (nonzero)  number. 
Consequently,  that  inner  product  can  be  used  to  normalize  the  eigenvector  to  produce  a  q 
which  satisfies  both  requirements,  and  Proposition  B2  follows. 

Proposition  B3:  If  P  is  an  n  x  n  stochastic  primitive  matrix,  then 

lim(P'')  =  Tq^ 

where  q  is  the  unique  vector  asserted  in  Proposition  B2. 
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This  result  follows  from  Theorem  B3  by  specializing  it  to  nxn  stochastic  primitive 
matrices  via  Theorem  B4,  Proposition  Bl  and  Proposition  B2.  A  very  significant  conse- 
quence is  the  following. 

Proposition  B4:  If  P  is  an  n  x  n  stochastic  primitive  matrix  and  x  an  arbitrary 
n-dimensional  probability  vector,  then 

lim(x^)  =  x'^q'  =  q^ 

k  — »o" 

where  q  is  the  unique  vector  asserted  in  Proposition  B2. 

Definition  B3:  If  the  n  x  n  stochastic  primitive  matrix  P  in  Theorem  B4  and  Propositions 
B1-B4  is  the  state  transition  matrix  of  a  time  homogeneous  Markov  chain,  then  the 
unique  vector  q  asserted  in  Proposition  B2  is  the  stationary  distribution  of  the  Markov 
chain. 


APPENDIX  C 


VANDERMONDE  DETERMINANTS,  SYMMETRIC  AND  ALTERNATING 

POLYNOMIALS 

C.l  Introduction 
An  order  n  determinant  whose  i,  j  element  is  given  by  ^j{x)  for  some  set  of  n  scalar 
functions  ())j  and  a  companion  set  of  n  scalar  variables  x^,  i.e. 


A„(x)  - 


(1),(X,)       (t)2(x,) 
(t),(X2)       (Jj^CXj) 


<t)„(x,) 

4>„(X2) 


Eq.  C.l 


01  (Xn)        H^n)        ■•■        ^JlnCXn) 

is  called  an  alternant.  The  name  derives  from  the  fact  that  exchanging  any  pair  of  the 
variables  in  its  argument  list  (e.g.  Xp  and  x<,)  affects  the  value  of  A„(x)  =  A„(x,,  X2,  •  •  ■,  x„) 
only  by  reversing  its  algebraic  sign.  This  property  is  clear  from  Eq.  C.l,  because  trans- 
posing the  variables  Xp  and  x^  in  A„(x)  amounts  to  exchanging  the  corresponding  rows  of 
the  determinant,  and  from  an  elementary  property  of  determinants,  any  such  row 
exchange  leaves  the  determinant  value  unchanged  in  magnitude  but  reversed  in  sign. 
The  state  transition  matrix  of  the  Markov  chain  representing  the  simple  genetic 
algorithm,  as  introduced  in  Section  4  of  this  paper,  is  a  multivariate  generalization  of  the 
matrix  form  underlying  the  alternant.  The  coefficient  symmetries  noted  in  Section  9.5  in 
connection  with  the  stationary  distribution  representation  development  are  a  conse- 
quence. Section  10  proposes  exploiting  this  connection  in  continuing  the  stationiuy  distri- 
bution representation  work  begun  in  Section  9.  This  appendix  provides  some  of  the 
related  background. 
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If  the  (}).  in  Eq.  C.  1  are  consecutive  integer  powers  of  their  arguments,  indexed  from 
0  through  n-  1,  i.e. 


A„(x)  =  |V„(x)|  = 


n-l 
n-l 


1  i 

1  Xn  X„ 

D  n 


n-l 


Eq.  C.2 


then  the  resulting  special  case  alternant  is  known  as  a  Vandermonde  determinant.  The 
values  of  a  Vandermonde  determinant  and  its  minors  are  closely  related  to  a  class  of 
polynomials  in  n  variables  referred  to  as  the  symmetric  polynomials  (and  to  a  companion 
class  of  polynomials  referred  to  as  alternating  polynomials).  The  distinguishing  feature  of 
the  symmetric  polynomials  is  invariance  with  respect  to  permutations  of  the  argument  list 
(e.g.  y(x,, Xj,  X3)  =  X,  +  Xj  +  Xj  =  \|/(x2,  x,,  X3)).  Alternating  polynomials  reverse  sign  with 
each  transposition  of  variables. 

Section  C.2  below  develops  an  expression  for  the  value  of  the  order  n  Vander- 
monde determinant  in  Eq.  C.2.  The  evaluation  method  employs  the  determinant  form  and 
a  polynomial  remainder  theorem  due  to  Bezout.  Section  C.3  introduces  formal  definitions 
of  symmetric  and  alternating  polynomials  and  a  fundamental  theorem  which  associates 
them  with  Vandermonde  determinants.  Section  C.4  generalizes  the  symmetric  and  alter- 
nating polynomial  notions  to  the  form  required  by  the  discussion  in  Section  9.5. 
C.2  Evaluation  of  Vandermonde  Determinants 

The  value  of  the  order  n  Vandermonde  determinant  can  be  deduced  from  its  form 
(Eq.  C.2)  and  a  polynomial  remainder  theorem  due  to  Bezout.  Let  \|/(x)  be  an  arbitrary 
polynomial  function  in  n  variables  (i.e.  x  =  (x„  X2,  •  •  •, x„))  and  let  xf '  be  generated  from  x 
by  replacing  x^  with  the  value  a.  Then,  the  theorem  states  that  if  \j/(x)  is  divided  by  the 
binomial  (x;  -  a)  the  remainder  is  \j/(x('^)  [MoSt64].  That  is 

\|/(x)  =  (x,-a)(t)(x)-Hv(x|*'). 
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If  a  is  selected  as  a  =  Xj  for  some  j^i,  then  x(«)  contains  the  value  Xj  at  two  distinct 
index  locations  in  its  list  (i.e.  at  i  and  j).  Consequently,  if  \|/(x)  represents  the  value  of  the 
Vandermonde  determinant  in  Eq.  C.2  (i.e.  v(x)  =  A„(x)  =  |  V„(x)| ),  then  the  Vandermonde 
determinant  represented  by  the  polynomial  function  \|/(xj*')  =  A„(x|'')  =  |  V„(x|'*)|  contains 
two  identical  rows  and  hence  is  zero.  In  that  case,  the  Bezout  theorem  reduces  to 

y\f(x)  =  i\-x^MK). 

Thus  (Xj  -  Xj)  is  a  factor  of  A„(x)  =  |  V„(x)| . 

This  argument  applies  to  each  of  the  n(n  -  l)/2  distinct  difference  factors  (Xp  -  x,).  It 
follows  that  A„(x)  =  |  V„(x)|  can  be  written  as 

A„(x)  =  I  V„(x)|  =  <D(x)     n"    (Xp  -  X.,)  Eq.  C.3 

p  =  2,q<p 

where  <I)(x)  is  a  polynomial  in  x  =  (x„  Xj,  •  •  •,  xJ. 

Every  term  in  the  polynomial  expansion  of  the  determinant  in  Eq.  C.2  includes 
exacdy  one  factor  selected  from  each  column.  Thus,  the  order  of  every  such  term  is 

n-t 

i:k  =  n(n-l)/2. 

k  =  0 

The  polynomial  order  of  the  product  of  difference  factors  in  Eq.  C.3  is  also  n(n  -  1  )/2. 
Thus  the  polynomial  0(x)  in  Eq.  C.3  is  exactly  the  zero  order  polynomial  0(x)  =  1,  and 
consequently  the  value  of  the  order  n  Vandermonde  determinant  in  Eq.  C.2  is 

A„(x)  =  I  V„(x)|  =     n"    (Xp  -  X,)  Eq.  C.4 

p  =  2,q<p 

It  follows  from  Eq.  C.4  that  An(x)  =  |  Vn(x)|  alternates  sign  with  every  exchange  of 
two  variables  in  its  argument  list.  It  also  follows  that  A„(x)  =  |  V„(x)|  is  nonzero  if  and 
only  if  all  of  the  x,^'s  are  distinct. 

C.3  Symmetric  (and  Alternating')  Polynomials 

A  polynomial  V|/  in  n  variables  having  the  property  that 

y(x„X2,---,x„)  =  \|/(x^,x^,---,x.^)  Eq.  C.5 
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where  a  =  ((x  ,oiy,--,CL,)^^  a  permutation  of  the  integers  (1,2,  -sn)  is  said  to  be  symmet- 
ric with  respect  to  the  given  permutation.  If  this  property  applies  for  all  n!  permutations 
of  the  X|j's,  then  V|/  is  a  symmetric  polynomial  [MoSt64].  Some  examples  of  symmetric 
polynomials  are 

\)/(Xj,  X2,  X3)  =  Xj  +  X2  +  X3 

and  V(xi,X2,-",xJ  =  x;;  +  x^  +  ---+x^ 

It  is  straightforward  to  show  that  any  sum,  difference  or  product  of  symmetric  polyno- 
mials is  a  symmetric  polynomial.  In  fact,  the  symmetric  polynomials  form  a  ring. 
If  the  symmetric  polynomial  \|/(x)  =  \i/(xi,X2,-",Xn)  includes  the  monomial 

Pl     P2  Pd 

among  its  terms,  then  it  includes  also  the  monomial 

Pl      P2  Pn 

aXa,Xot^""Xa^ 

where  a  =  (a,,  02,  •  •  •,  On)  is  an  arbitrary  permutation  of  the  integers  (1 , 2,  •  •  •,  n).  If  for  a 
given  p  =  (p„p2,--,pj  the  sum  of  all  distinct  monomials  of  this  form  is  designated  by 

(t)p(x)  =  Xx^x^-x'^,  Eq.C.6 

then  ^{\)  is  symmetric,  and  further,  the  arbitrary  symmetric  polynomial  \|/(x)  can  be 
written  as  a  linear  combination  of  a  finite  number  of  such  polynomials.  That  is 

V|/(x)  =  I  a^(x). 
p 

A  transposition  of  the  ordered  list  of  n  variables  x,^,  1  <  k  <  n  is  a  permutation  which 

exchanges  the  positions  of  any  two  of  the  x^'s.  Every  permutation  of  the  ordered  list  can 

be  written  as  a  composition  of  transpositions  applied  to  (1,2,  ••■,n),  and  for  any  specified 

permutation,  if  any  such  composition  includes  an  odd  number  of  transpositions,  then 

every  such  composition  includes  an  odd  number  of  transpositions.  Similarly,  if  any  such 
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composition  includes  an  even  number  of  transpositions,  then  every  such  composition 
includes  an  even  number  of  transpositions.  A  permutation  is  designated  odd  or  even 
depending  upon  whether  its  decomposition  into  transpositions  yields  an  odd  or  even  num- 
ber of  factors  respectively.  If  n  >  1,  then  exactly  n!/2  odd  and  n!/2  even  permutations 
exist  [MoSt64]. 

A  polynomial  Y  in  n  variables  possessing  the  property  that 

7(x„X2,---,xJ  =  -7(x^,x^,---,x^)  Eq.  C.7 

for  every  odd  permutation  a  =  (a,,a2,  •••,(X„)  of  its  argument  list  is  an  alternating  polyno- 
mial. It  follows  that  any  sum  or  difference  of  alternating  polynomials  is  an  alternating 
polynomial,  that  the  product  of  a  symmetric  polynomial  and  an  alternating  polynomial  is 
an  alternating  polynomial  and  that  the  product  of  any  odd  number  of  alternating  polyno- 
mials is  an  alternating  polynomial.  The  product  of  any  even  number  of  alternating  poly- 
nomials is  symmetric. 

If  the  alternating  polynomial  7(x)  =  7(x,,  Xj,  •  •  •,  x„)  includes  the  monomial 

Pl     P2  Pn 

ax,  Xj  ••■x„ 
among  its  terms,  then  it  includes  also  the  monomial 

/     ,  ,.s(a)       Pl     P2  Pn 

(-ir^ax^x^---x^ 

where  a=  (a,,(X2,  •••,an)  is  an  arbitrary  permutation  of  the  integers  (1,2,  -sn)  and  where 
s(a)is  the  number  of  transpositions  in  the  permutation  a.  If  for  a  given  p  =  (Pi,P2,  -sPn) 
the  sum  of  all  distinct  monomials  of  this  form  is  designated  by 

Pp(x)  =  I(-irx:;x:^-x:^,  Eq.C.8 

then  Pp(x)  is  alternating,  and  further,  the  arbitrary  alternating  polynomial  y{\)  can  be  writ- 
ten as  a  linear  combination  of  a  finite  number  of  such  polynomials.  That  is 

7(x)  =  S  a^p^(x). 
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As  pointed  out  in  the  concluding  paragraph  of  Section  C.2,  the  polynomial  function 
(defined  by  the  product  in  Eq.  C.4)  which  represents  the  value  of  an  order  n  Vander- 
monde  determinant  alternates  sign  with  each  exchange  of  two  variables  in  its  argument 
list.  Thus,  it  is  an  alternating  polynomial.  In  fact  the  polynomial  function  which 
represents  a  Vandermonde  determinant  is  an  elementary  alternating  polynomial  in  the 
sense  defined  by  the  following  theorem,  proof  of  which  is  provided  in  [Aitk54]  and 
[Muir60]. 

Theorem  C.  1 :  If  yis  an  alternating  polynomial  in  the  ordered  list  of  n  variables 
X,;,  1  <  k  <  n,  then  yean  be  written  as 

k  =  n 

7(x„X2,---,x„)  =  \|/(x„X2,---,xJx     n    (x^-Xj) 

k  =  2,j  <  k 

=  V|/(x)x|V„(x)| 
where  y  is  a  symmetric  polynomial. 

C.4  Quasi-Symmetric  (and  Quasi-Alternating)  Polynomials 
The  definitions  of  symmetric  and  alternating  polynomials  supplied  in  Section  C.3 
require  that  the  relevant  properties  (Eq.  C.5  and  Eq.  C.7)  apply  for  all  n!  permutations  of 
the  integers  (1,2,  •••,n).  This  section  generalizes  those  notions  to  multivariate  analogs, 
suitable  for  the  discussion  presented  in  Section  9.5.  The  generalization  amounts  to 
restricting  the  applicability  of  Eq.  C.5  and  C.7  to  a  subset  of  the  n!  permutations  of 
(1,2,  •••,n).  The  resulting  polynomial  classes  are  referred  to  here  as  quasi-symmetric  and 
quasi-alternating  polynomials  respectively. 

Let  \|/  be  a  polynomial  function  of  n  =  mk  scalar  variables  x^,  1  <  i  <  m,  1  <  j  <  k  and 
let  \/ be  denoted  \|/(x,,X2,---,x„)  where  X|  is  a  k-component  vector  composed  from  the 
scalars  x^,  1  <  j  <  k.  Then,  y  is  quasi-symmetric  if 

V(x„X2,---,0  =  V(S,V---,x„^)  Eq.  C.9 
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for  all  permutations  a  =  (a,, a2,---,a„)®f^h^^"^^g^''"s  (1,2,  ■••,m).  The  set  of  all  m!  such 
permutations  can  be  placed  in  one-to-one  correspondence  with  a  subset  of  the  set  of  all  n! 
=  (mk)!  permutations  of  (1,2,  ■■■,n). 

If  the  quasi-symmetric  polynomial  \|/(x)  =  \|i(xj,  Xj,  •  •  •,  O  includes  the  monomial 

among  its  terms,  then  it  includes  also  the  monomial 


< 


P„      P,2     _        Pu  P2I      P22  P2i      ...L,Pn.l      Pm2  Pmk 


where  a  =  (ai,(X2,  •••,(X„)is  an  arbitrary  permutation  of  the  integers  (1,2, -^m).  If  for  a 
given 

P  =  (Pl>P2'-'-'Pm)  =  (Pu'Pl2'---.Plk'P2Pp22'---'P2k'---.Pml'Pm2'---.Pmk) 

the  sum  of  all  distinct  monomials  of  this  form  is  designated  by 

i(x)  =  ll^y:;,- ■  ■xll'J^x^'.xJ. •  -x^-J •  {x^^'.x^^V •  -x^^t).  Eq.  CIO 

then  <j)^(x)  is  quasi-symmetric,  and  further,  the  arbitrary  quasi-symmetric  polynomial  y(x) 
can  be  written  as  a  linear  combination  of  a  finite  number  of  such  polynomials.  That  is 

V(x)  =  I  a^(x). 


p 

Eq.  C.9  and  CIO  are  analogous  to  Eq.  C.5  and  C.6.  The  counterparts  of  Eq.  C.7  and 
Eq.  C.8  can  be  developed  by  straightforward  extension  of  these  results.  Thus,  let  y  be  a 
polynomial  function  of  n  =  mk  scalar  variables  Xy,  1  <  i  <  m,  1  <  j  <  k  and  let  y  be  denoted 
7(x,,  X2,  •  •  •,  x„)  where  Xj  is  a  k-component  vector  composed  from  the  scalars  x^,  1  <  j  <  k. 
Then,  y  is  quasi-alternating  if 

y(x„X2,---,xJ  =  -y(x,^,x^,---,x^)  Eq.  Cll 

for  every  odd  permutation  a  =  (a,,(X2,  •••,oc„)of  the  integers  (1,2, -sm).  The  set  of  all 
m!/2  such  permutations  can  be  placed  in  one-to-one  correspondence  with  a  subset  of  the 
set  of  all  n!/2  =  (mk)!/2  odd  permutations  of  (1,2,  -sn). 
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If  the  quasi-alternating  polynomial  y(x)  =  7(x„  x^,  ■■■,\J  includes  the  monomial 


''22  ■  ■  '^2k  j   '  \\nl  \ia  ' '  'Kitk 

among  its  terms,  then  it  includes  also  the  monomial 

where  a  =  (a„  0X2,  •  •  • ,  a„)  is  an  arbitrary  permutation  of  the  integers  ( 1 , 2,  •  •  • ,  m)  and 
where  s(a)  is  the  number  of  transpositions  in  the  permutation  a.  If  for  a  given 

P  =  (Pl,P2'---.Pm)  =  (PlpPl2.---.Plk.P21.P22.---.P2k'---.Pml.Pm2.---.Pmk) 

the  sum  of  all  distinct  monomials  of  this  form  is  designated  by 

p-(x)  =  y(-l)'<"*fx''"x''"---x''"'Yx''"x''''---x''^\./Y''°"v'''^       v''"'0  Pr,    rO 

then  p^(x)  is  quasi-alternating,  and  further,  the  arbitrary  quasi-alternating  polynomial  y(\) 
can  be  written  as  a  linear  combination  of  a  finite  number  of  such  polynomials.  That  is 

7(x)  =  I  a^pp(x). 


APPENDIX  D 
COMPUTER  LISTINGS 
D.l  Introduction 
This  appendix  includes  listings  of  the  computer  programs  used  to  generate  the  sim- 
ulation data  presented  in  Section  5.  These  programs  were  developed  on  the  Eglin  AFB, 
Fl.  Cray  Y-MP.  The  programs  are  all  written  in  Fortran  and  in  some  cases  they  employ 
Cray  extensions  to  the  Fortran  standard.  The  listings  are  separated  into  two  subsections, 
one  including  main  program  listings  and  a  second  including  the  contents  of  a  subprogram 
library  accessed  by  the  main  programs.  The  library  procedures  section  also  includes  a 
library  table  of  contents. 

D.2  Main  Program  Listings 

PROGRAM  GET_NPS 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
C 

C         Name:  GET_NPS 

C 

C         Purpose:  Compute  (via  binomial  coefficient)  and  output 

C  the  cardinality  of  the  indicated  S  's  and  S  "s 

C 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

c 

C        Declare  local  variables 
C 

INTEGER  M,  L,  NN,  NP,  NPP 

DOUBLE  FNP,  FNPP 

C 

C         Loop  over  M  =  1  to  M  =  8 
C 

D02M  =  1,8 

WR1TE(6,  *)'M  =  ',M 
C 

C        Get  the  answers  and  write  them  to  stdout 
C 

DO  1  L=  1,8 

NN  =  2**L  +  M 

FNP-  1.0 

FNPP=  1.0 
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DO  4  J  =  0,  M  -  1 

FNP  =  FNP*FLOAT(  NN  -  1  -  J  )/FLOAT(  M  -  J  ) 
4  FNPP  =  FNPP*FLOAT(  NN  -  J  )/FLOAT(  M  -  J  ) 

NP  =  0.5  +  FNP 
NPP  =  0.5  +  FNPP 

1  WRITE(  6,  3  )  L,  2**L,  NP,  NPP 

2  WRITE(  6,  *  ) 

3  FORMAT(  5H  L  =  ,  14,  5H  N  =  ,  18, 

&  6H  NP  =  ,  122,  7H  NPP  =  ,  122  ) 

C 

C        Finished 
C 

END 

PROGRAM  GET_SPS 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

C        Name:  GET_SPS 
C 

C         Purpose:  Compute  and  output  the  indicated  S's 
C 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 

C        Declare  local  variables 
C 

INTEGER  M,  L,  NP,  LM,  LL,  LN,  J 

INTEGER  SP(  4*5984  ),  NBAR(  16  ) 
C 

C        Prompt  for  and  read  ranees 
C 

WRITE(  6,  *  )  '  Maximum  M?  ' 

READ(  5,  '(18)' )  M 

WRITE(  6,  *  )  '  Maximum  L?  ' 

READ(5, '(I8)')L 
C 

C        Get  the  answers  and  write  them  to  stdout 
C 

D03LM  =  1,M 

D02LL=1,L 

CALL  GET_SP(  LM,  LL,  SP,  NP  ) 

WRITE(  6,  *  ) 

WRITE(  6,  4  )  LM,  LL,  2**LL,  NP 

DO  1LN=  1,NP 

S5;yt  UNPACK_NBARP(  LM,  LL,  SP(  (  LN  -  1  )*LM  +  1  ),  NBAR  ) 

1  WRITE(6,5)(NBAR(J),J=1,2**LL) 

2  CONTINUE 

3  CONTINUE 

5  f8rMAt!  'eU)  =  •  ■^-  ™  ^  -  •  "•  5"  ^  =  •  ''■  6H  NP  =  .  „2  ) 

C         Finished 
C 

END 
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PROGRAM  GET_R 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

C 

C    Name:  GET_R 

C 

C        Purpose:  Generate  the  indicated  reward  function  and  write 

C  it  to  disk 

C 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C         Declare  local  variables 

c 

INTEGER  L,  SCHEMA,  SCHEMA_MASK,  DELTA 

REAL  R(  0:2**8  -  1  ),  WEIGHT 

CHARACTER*8  SCHEMAC 
C 

C         Prompt  for  and  read  the  bit-string  length 
C 

WRITE(  6,  *  )  'Bit-string  length?  ' 

READ(  5,  *  )  L 
C 

C        Loop  over  all  schemata  and  retrieve  the  associated  weight 
C 

D04K=1,3**L 

D0  1N  =  1,L 

INDICATOR  =  MOD(  (  K  -  1  )/3**(  N  -  1  ),  3  ) 

IF(  INDICATOR  .EQ.  0  )  THEN 
SCHEMAC(N:N)  =  '*' 

ELSE  IF(  INDICATOR  .EQ.  1  )  THEN 
SCHEMAC(  N:N  )  =  '0' 

ELSE 

SCHEMAC(N:N)  =  'l' 

END  IF 
1  CONTINUE 

WRITE(  6,  *  )  SCHEMAC(  1:N  ) 

READ(  5,  *  )  WEIGHT 
C 

C        Build  the  schema  and  schema  mask 
C 

SCHEMA  =  0 

SCHEMA_MASK  =  0 

D03N  =  1,L 

DELTA  =  2**(  N  -  1  ) 

IF(  SCHEMAC(  N:N  )  .EQ.  '1'  )  THEN 
SCHEMA  =  SCHEMA  +  DELTA 
SCHEMA_MASK  =  SCHEMA_MASK  +  DELTA 

ELSE  IF(  SCHEMAC(  N:N  )  .EQ.  '0' )  THEN 

SCHEMA_MASK  =  SCHEMA_MASK  +  DELTA 

END  IF 
3         CONTINUE 
C 

C         Now  add  required  contributions  to  R 
C 

DO  4  I  =  0,  2**L  -  1 

IF(  AND(  SCHEMA_MASK,  XOR(  I,  SCHEMA  ) )  .EQ.  0  ) 


148 


&  R(  I )  =  R(  I )  +  WEIGHT 

4  CONTINUE 

5  CONTINUE 
C 

C        open  the  output  data  file  and  write  R 

C 

OPEN(  1,  nLE='RDATA',  STATUS='NEW'  ) 
WRITE(  1,  '(8F10.6)'  )  (  R(  J  ),  J  =  0,  2**L  -  1  ) 

C 

C        Finished 


C 


END 


PROGRAM  GET_P2INS 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
C 

C        Name:  GET_P2INS 
C 

C        Purpose:  Compute  and  return  the  indicated  conditional 
C  probability  arrays 

C 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

c 

C         Declare  problem  defining  parameters 
C 

INTEGER  M,  L,  NP,  NALPHA 

PARAMETER  (  M  =  6,  L  =  4,  NP  =  54264,  NALPHA  =  10  ) 

INTEGER  SP(  M,  NP  ) 

REAL  R(  0:2**L  -  1  ),  P2IN(  0:2**L  -  1  ) 

INTEGER  NCOUNTS(  4 ) 

DATA  NCOUNTS/1,3096,3100,54181/ 

CHARACTER*8  P2INDATA 

DATA  P2INDATA/'P2INDATA7 
C 

C        Declare  local  variables 
C 

INTEGER  NBAR(  0:15) 

REAL  ALPHA 
C 

C        Get  the  objective  function  values 
C 

OPEN(  1,  nLE='RDATA',  STATUS='OLD'  ) 

READ(  1, '(8F10.6)' )  (  R(  I ),  I  =  0,  2**L  -  1  ) 

CLOSE(  1  ) 
C 

C         Open  the  output  data  file  and  write  the  summary  data 
C 

OPEN(  1,  FILE=P2INDATA,  STATUS='NEW'  ) 

WRITE(  1,  '(418)' )  M,  L,  NP,  NALPHA 

WRITE(  1,'(8F10.6)')R 

WRITE(1,*) 
C 

C        Build  the  state  space  array 
C 

CALL  GET_SP(  M,  L,  SP  ) 
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C 

C        Generate  the  required  P2IN  data 
C 

D0  2NC0UNT=  1,4 

CALL  UNPACK_NBARP(  M,  L,  SP(  1,  NCOUNTS(  NCOUNT  ) ),  NBAR  ) 

WRITE(  1,  '(1612)'  )  NBAR 

DO  1  I  =  0,  NALPHA  -  1 

ALPHA  =  L0/2**I 

WRITE(  1,  '(F10.6)' )  ALPHA 

CALL  GET_P2IN(  M,  L,  SP(  1,  NCOUNTS(  NCOUNT  ) ), 
&  ALPHA,  R,  P2IN  ) 

WRITE(  1,'(8F10.6)')P2IN 

1  WRITE(  1,  *  ) 

2  CONTINUE 
C 

C         Finished 
C 

END 

PROGRAM  GET_P3INS 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
C 

C        Name:  GET_P3INS 
C 

C         Purpose:  Compute  and  return  the  indicated  conditional 
C  probability  arrays 

C 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C         Declare  problem  defining  parameters 
C 

INTEGER  M,  L,  NP,  NALPHA 

PARAMETER  (  M  =  6,  L  =  4,  NP  =  54264,  NALPHA  =  10 ) 

INTEGER  SP(  M,  NP  ) 

REAL  R(  0:2**L  -  1  ),  P3IN(  0:2**L  -  1  ) 

INTEGER  NCOUNTSC  4 ) 

DATA  NCOUNTS/1,3()96,3100,54181/ 

CHARACTER*8  P3INDATA 

DATA  P3INDATA/'P3INDATA'/ 
C 

C        Declare  local  variables 
C 

INTEGER  NBAR(  0:15) 

REAL  ALPHA 
C 

C        Get  the  objective  function  values 
C 

OPEN(  1,  nLE='RDATA',  STATUS^'OLD'  ) 

READ(  1,  '(8F10.6)'  )  (  R(I  ),  1  =  0,  2**L  -  1  ) 

CLOSE(  1  ) 
C 

C         Open  the  output  data  file  and  write  the  summary  data 
C 

OPEN(  1,  FILE=P3INDATA,  STATUS='NEW'  ) 

WRITE(  1,  '(418)'  )  M,  L,  NP,  NALPHA 
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C 


WRITE(1,'(8F10.6)')R 
WRITE(  1,  *  ) 


C        Build  the  state  space  array 

CALL  GET_SP(  M,  L,  SP  ) 
C 

C        Generate  the  required  P3IN  data 
C 

D0  2NCOUNT=l,4 

CALL  UNPACK_NBARP(  M,  L,  SP(  1,  NCOUNTS(  NCOUNT ) ),  NEAR  ) 
WRITEd, '(1612)')  NEAR  ^ 

DO  1  I  =  0,  NALPHA  -  1 
ALPHA  =  1.0/2**1 
WRITE(1,'(F10.6)')  ALPHA 

CALL  GET_P3IN(  M,  L,  SP(  1,  NCOUNTS(  NCOUNT  ) ) 
&  ALPHA,  R,  P3IN  ) 

WRITEd, '(8F10.6)')P3IN 

1  WRITE(  1,  *  ) 

2  CONTINUE 
C 

C         Finished 
C 

END 

PROGRAM  GET_3STAT 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

C        Name:  GET_3STAT 
C 

C        Purpose:  Compute  the  indicated  three-operator  stationary 

C  distribution 

C 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
C        Declare  defining  parameters  and  output  file  name 

INTEGER  M,  L,  NP 

PARAMETER  (  M  =  4,  L  =  5,  NP  =  52360  ) 

REAL  ALPHA,  ALPHAO,  R(  0:2**L  -  1  ),  TDELTA 

INTEGER  NALPHA 

PARAMETER  (  ALPHAO  =  1.0/2**20,  NALPHA  =  1,  TDELTA  =  0  004  ) 

CHARACTER*  lOTESTFILE  ' 

DATA  TESTFILE/'3TEST45'/ 

C        Declare  state  space  associated  arrays 

INTEGER  SP(  M,  NP  ),  SPA(  0:2**L  -  1  ),  MULTI(  NP  ) 
REAL  QEAR(NP,  0:1) 

C        Declare  some  local  variables 


C 


REAL  LDELTA 

INTEGER  TOGGLE,  LOOP_COUNT 
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C        Get  the  objective  function  data 
C 

OPEN(  1,  nLE='RDATA4',  STATUS='OLD'  ) 

READ(  1,  '(8F10.6)' )  (  R(  J  ),  J  =  0,  2**L  -  1  ) 

CLOSE(  1  ) 
C 

C        Open  the  output  data  file  and  write  the  summary  output  data 
C 

OPEN(  1,  nLE=TESTFILE,  STATUS  =  'NEW,  FORM= 'UNFORMATTED' ) 

WRITE(  1  )  M,  L,  NP 

WRITE(  1  )  ALPHAO,  NALPHA,  TDELTA 

WRITE(  1  )  R 
C 

C        Generate  and  store  the  state  space  set,  S' 
C 

CALL  GET_SP(  M,  L,  SP  ) 
C 

C        Next,  get  the  indices  of  the  absorbing  states  in  S' 
C 

CALL  GET_SPA(  M,  L,  NP,  SP,  SPA  ) 
C 

C         Now  get  the  associated  muhinomial  coefficient  array 
C 

CALL  GET_MULTI(  M,  L,  NP,  SP,  MULTI ) 
C 

C         Compute  for  each  required  ALPHA 
C 

DO  3  K  =  0,  NALPHA  -  1 
C 

C         InitiaHze  ALPHA  and  QBAR 
C 

ALPHA  =  ALPHA0/2**K 

CALL  INIT_QBAR(  M,  L,  NP,  MULTI,  QBAR  ) 
C 

C        Loop  until  the  tolerance  parameter  is  met 
C 

TOGGLE  =  0 

LOOP_COUNT  =  0 

LDELTA  =  LO 

DO  1  WHILE  (  LDELTA  .GT.  TDELTA  ) 

CALL  GET_3QBAR(  M,  L,  NP,  ALPHA,  R,  MULTI, 
&  SP,  SPA,  QB  AR(  1 ,  TOGGLE  ), 

&  QB AR(  1 ,  MOD(  TOGGLE  +1,2)), 

&  LDELTA ) 

TOGGLE  =  MOD(  TOGGLE  +1,2) 

LOOP_COUNT  =  LOOP_COUNT  +  1 

WRITE(6,*)  LOOP_COUNT,  TDELTA,  LDELTA 

1  CONTINUE 
C 

C         Output  the  termination  infonnation  and  the  final  vector 
C 

WRITE(  1  )  ALPHA,  LOOP_COUNT  -  1,  LDELTA 

DO2I  =  l,NP/40 

2  WRITE(  1  )  (  QB AR(  4()*(  I  -  1  )  +  J,  TOGGLE  ),  J  =  1 ,  40  ) 
WRITEC  1  )  (  QBAR(  J,  TOGGLE  ),  J  =  4()*{  NP/4() )  +  1,  NP  ) 
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3         CONTINUE 
C 

C         Last  write  the  absorbing  state  vector  values 
C 

WRITE(  1  )  (  QBAR(  SPA(  I ),  TOGGLE  ),  I  =  0,  2**L  -  1  ) 
C 
C        Finished 


C 


END 


D.3  Library  Listings 


stat.o:GET_SP 

stat.o:GET_SPA 

stat.o:GET_MULTI 

stat.o:INIT_QBAR 

stat.o:GET_3QBAR 

stat.o:INIT_NBARP 

stat.o:GET_NBARP 

stat.o:GET_NFAC 

stat.o:GET_P3MN 

stat.o:GET_P3IN 

stat.o:GET_PlIN 

stat.o:UNPACK_NBARP 

SUBROUTINE  GET_SP(  M,  L,  SP,  NP  ) 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C         Name:  GET_SP 

c 

C         Purpose:  Generate  S' 

C 

C        Note:  The  fourth  argument  (NP)  is  optional 

C 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C         Declare  calling  arguments 
C 

INTEGER  M,  L,  SP(  M,  0:*  ),  NP 
C 

C        Declare  local  variables 
C 

INTEGER  NCOUNT,  NBARP(  M  ),  I,  NARG 

LOGICAL  LSTAT 
C 

C         Initialize  NBARP 
C 

CALL  INIT_NBARP(  M,  NBARP  ) 
C 

C        Loop  until  S' is  complete 
C 

NCOUNT  =  0 

LSTAT  =  .TRUE. 

DO  2  WHILE(  LSTAT  ) 
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C         Set  this  element  in  S' 
C 

DOl  1=  1,M 

1  SP(  I,  NCOUNT  )  =  NBARP(  I ) 
C 

C         Get  the  next  one  and  increment  the  counter 
C 

CALL  GET_NBARP(  M,  L,  NBARP,  LSTAT  ) 

2  NCOUNT  =  NCOUNT  +  1 
C 

C         Test  argument  count  to  determine  whether  to  set  NP 
C 

NARG  =  NUMARGO 

IF(  NARG  .EQ.  4  )  NP  =  NCOUNT 
C 

C        Return  to  caller 
C 

RETURN 

END 

SUBROUTINE  GET_SPA(  M,  L,  NP,  SP,  SPA  ) 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
C 

C        Name:  GET_SPA 
C 

C        Purpose:  Generate  a  table  of  absorbing  state  indices  in  S' 
C 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

c 

C        Declare  calling  arguments 
C 

INTEGER  M,  L,  NP,  SP(  M,  NP  ),  SPA(  0:2**L  -  1  ) 
C 

C        Declare  some  local  variables 
C 

INTEGER  I,  J,  K,  JSTART 
C 

C         Initialize 
C 

JSTART  =  1 
C 

C         Loop  over  all  I  in  S 
C 

D03I  =  1,2**L 
C 

C         Loop  over  all  J  in  S' 
C 

DO  2  J  =  JSTART,  NP 
C 

C        Test  SP  until  an  exit  condition  is  satisfied 
C 

DO  1  K  =  1,M 

IF(  SP(  K,  J  )  .NE.  I )  GO  TO  2 
1  CONTINUE 

C 
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C        Exhausting  M  signals  an  absorbing  state 
C         Assign  it  and  go  after  the  next  one 
C 

SPA( I  -  1  )  =  J 

JSTART  =  J  +  1 

GO  TO  3 
C 

C        Exit  to  label  2  means  that  this  J  is  not  an 
C        absorbing  state 
C 

2  CONTINUE 

3  CONTINUE 
C 

C        Return  to  caller 
C 

RETURN 

END 

SUBROUTINE  GET_MULTI(  M,  L,  NP,  SP,  MULTI ) 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C         Name:  GET_MULTI 

c 

C        Purpose:  Get  the  multinomial  coefficient  table  for  the 

C  supplied  S' 

C 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C        Declare  calling  arguments 
C 

INTEGER  M,  L,  NP,  SP(  M,  NP  ),  MULTI(  NP  ) 
C 

C        Declare  local  variables  and  function  reference 
C 

INTEGER  MFAC,  NBAR(  0:2**L  -  1  ),  NCOUNT 

INTEGER  I,  GET_NFAC 
C 

C        Compute  and  save  M! 
C 

MFAC  =  GET_NFAC(  M  ) 
C 

C        Loop  over  all  vectors  in  S' 
C 

D0  2NC0UNT=1,NP 
C 

C         Set  MULTI(  NCOUNT  )  =  M! 
C 

MULTI(  NCOUNT  )  =  MFAC 
C 

C         Now  unpack  this  NBARP 
C 

CALL  UNPACK_NBARP(  M,  L,  SP(  1,  NCOUNT  ),  NBAR  ) 
C 

C        Loop  over  the  denominator  factorials 
C 
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DO  1  I  =  0,  2**L  -  1 

1  MULTI(  NCOUNT  )  =  MULTI(  NCOUNT  )/GET_NFAC(  NBAR(  I ) ) 
C 

C        Close  the  loop 
C 

2  CONTINUE 
C 

C        Return  to  caller 


C 


RETURN 
END 


SUBROUTINE  INIT_QBAR(  M,  L,  NP,  MULTI,  QBAR  ) 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
C 

C        Name:  INIT_QBAR 
C 

C         Purpose:  Initialize  the  QBAR  probability  vectors 
C 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

c 

C        Declare  calling  arguments 
C 

INTEGER  M,  L,  NP,  MULTI(  NP  ) 

REAL  QBAR(NP,  0:1) 
C 

C        Declare  a  local  variable 
C 

REAL  FRACTION 
C 

C         Set  QBARl  to  its  ALPHA  =  1  value  and  zero  QBAR2 
C 

FRACTION  =  1.0/(  2**(  M*L  ) ) 

DOl  1=  1,NP 

QBAR(  I,  0  )  =  MULT1(  1  )*FRACTION 
1  QBAR(I,  1  )  =  0.0 

C 

C        Return  to  caller 
C 

RETURN 

END 

SUBROUTINE  GET_3QBAR(  M,  L,  NP,  ALPHA,  R,  MULTI, 
&  SP,  SPA,  QBARO,  QBARl,  LDELTA) 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C         Name:  GET_3QBAR 

C 

C         Purpose:  Compute  and  return  the  indicated  three-operator 

C  stationary  distribution  vector  transformation 

C 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

c 

C         Declare  calling  arguments 
C 
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INTEGER  M,  L,  NP,  MULT1(  NP  ),  SP(  M,  NP  ),  SPA(  0:2**L  -  1  ) 

REAL  ALPHA,  R(  0:2**L  -  1  ) 

REAL  QBARO(  NP  ),  QBAR1(  NP  ),  LDELTA 
C 

C        Declare  local  variables 
C 

INTEGER  MCOUNT,  NCOUNT 

REAL  SIGMA,  P3MN(  NP  ) 
C 

C         Clear  QBARl 
C 

DO  1  MCOUNT  =1,NP 

1  QB AR 1  (  MCOUNT  )  =  0.0 
C 

C        Loop  over  all  states  in  S' 
C 

D0  3NC0UNT=1,NP 
C 

C        Get  P3MN  for  this  NBAR 
C 

CALL  GET_P3MN(  M,  L,  NP,  ALPHA,  R,  MULTI,  NCOUNT,  SP,  P3MN  ) 
C 

C        Accumulate  them 
C 

D0  2MC0UNT=1,NP 

2  QB  AR  1  (  MCOUNT  )  =  QBAR1(  MCOUNT  )  + 

&  QBARO(  NCOUNT  )  *  P3MN(  MCOUNT  ) 

C 

C         Close  the  NBAR  loop 
C 

3  CONTINUE 
C 

C        Normalize 
C 

SIGMA  =  0.0 

DO  4  MCOUNT  =  1,NP 

4  SIGMA  =  SIGMA  +  QBARl  (MCOUNT) 
D0  5MC0UNT=1,NP 

5  QBARK  MCOUNT  )  =  QBAR1(  MCOUNT  )/SIGMA 
C 

C        Reset  LDELTA 
C 

SIGMA  =  0.0 

DO  6  I  =  0,  2**L  -  1 

6  SIGMA  =  SIGMA  +  QBAR1(SPA(I)) 
LDELTA  =  1.0 -SIGMA 

C 

C        Return  to  caller 

C 

RETURN 

END 
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SUBROUTINE  INIT_NBARP(  M,  NBARP  ) 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

c 

C         Name:  INIT_NBARP 

C 

C        Purpose:  Initialize  NBARP  to  the  starting  element  of  S ' 

C 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C         Declare  calling  arguments 
C 

INTEGER  M,  NBARP(  M  ) 
C 

C        Set  all  pointers  in  the  NBARP  array  to  cell  1 
C 

D0  1I=1,M 
1  NBARP(  I )  =  1 

C 

C        Return  to  caller 
C 

RETURN 

END 

SUBROUTINE  GET_NBARP(  M,  L,  NBARP,  NSTAT  ) 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

C 

C         Name:  GET_NBARP 

C 

C        Purpose:  Generate  the  successor  element  in  S' 

C 

C         Note:     (1)  This  procedure  assumes  that  M  and  L  are  in  range 

C  and  that  the  caller  supplied  NBARP  is  valid 

C 

C  (2)  The  fourth  argument  (NSTAT)  is  optional 

C 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C         Declare  calling  arguments 
C 

INTEGER  M,  L,  NBARP(  M  ) 

LOGICAL  NSTAT 
C 

C        Declare  local  variables 
C 

INTEGER  IMAX,  I,  J,  NARG 

LOGICAL  LSTAT 
C 

C        Set  maximum  index 
C 

IMAX  =  2**L 
C 

C        Process  most  frequent  transition 
C 

IF(  NBARP(  M  )  .LT.  IMAX  )  THEN 

NBARP(  M  )  =  NBARP(  M  )  +  1 
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LSTAT  =  .TRUE. 
C 

C        Next  most  frequent 
C 

ELSE  IF(  NBARP(  1  )  .LT.  IMAX  )  THEN 

DO  1I  =  M-  1,1,-1 

IF(  NBARP(  I )  .LT.  IMAX  )  GO  TO  2 

1  CONTINUE 

2  NBARP(  I )  =  NBARP(  I )  +  1 
D0  3J  =  I+1,M 

3  NBARP(  J  )  =  NBARP(  I ) 
LSTAT  =  .TRUE. 

C 

C        Anything  else  is  terminal 

C 

ELSE 

LSTAT  =  .FALSE. 

END  IF 
C 

C         Test  argument  count  to  determine  whether  to  set  NSTAT 
C 

NARG  =  NUMARGO 

IF(  NARG  .EQ.  4  )  NSTAT  =  LSTAT 
C 

C        Return  to  caller 
C 

RETURN 

END 

INTEGER  FUNCTION  GET_NFAC(  N  ) 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
C 

C        Name:  GET_NFAC 
C 

C        Purpose:  Compute  and  return  N! 
C 

C        Note:  This  procedure  assumes  that  N  is  a  nonnegative  integer 
C 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
C 

C        Declare  calling  argument 
C 

INTEGER  N 
C 

C         IfN  =  OorN=l,thenN!  =  l 
C 

IF(  N  .LE.  1  )  THEN 

GET_NFAC  =  1 
C 

C        Otherwise,  recurse 
C 

ELSE 

GET_NFAC  =  N  *  GET_NFAC(  N  -  1  ) 

END  IF 
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C         Return  to  caller 
C 

RETURN 

END 

SUBROUTINE  GET_P3MN(  M,  L,  NP,  ALPHA,  R,  MULTI, 
&  NCOUNT,  SP,  P3MN  ) 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C        Name:  GET_P3MN 

C 

C        Purpose:  Compute  and  return  the  indicated  conditional 

C  probability  array 

C 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C         Declare  calling  arguments 
C 

INTEGER  M,  L,  NP,  MULTI(  NP  ),  NCOUNT,  SP(  M,  NP  ) 

REAL  ALPHA,  R(  0:2**L  -  1  ),  P3IN(  0:2**L  -  1  ),  P3MN(  NP  ) 
C 

C         Declare  local  variables 
C 

INTEGER  MCOUNT,  K 
C 

C         Get  P3IN  for  this  NBARP 
C 

CALL  GET_P3IN(  M,  L,  SP(  1,  NCOUNT  ),  ALPHA,  R,  P3IN  ) 
C 

C         Initialize  the  P3MN  vector 
C 

DOl  MCOUNT  =  1,NP 

1  P3MN(  MCOUNT  )  =  MULT1(  MCOUNT ) 
C 

C         Loop  over  the  solutions  represented  in  this  NBARP 
C 

D03J=  1,M 

D0  2MC0UNT=1,NP 

K  =  SP(  J,  MCOUNT  )  -  1 

2  P3MN(  MCOUNT )  =  P3MN(  MCOUNT  )*P3IN(  K  ) 

3  CONTINUE 
C 

C         Return  to  caller 
C 

RETURN 

END 

SUBROUTINE  GET_P3IN(  M,  L,  NBARP,  ALPHA,  R,  P3IN  ) 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
C 

C        Name:  GET_P3IN 
C 

C         Purpose:  Compute  and  return  the  indicated  conditional 
C  probability  array 

C 
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ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C        Declare  calling  arguments 
C 

INTEGER  M,  L,  NBARP(  M  ) 

REAL  ALPHA,  R(  0:2**L  -  1  ),  P3IN(  0:2**L  -  1  ) 
C 

C        Declare  local  variables 
C 

INTEGER  I,  J,  K,  IMAX,  INDX 

REAL  P1IN(  0:2**L  -  1  ),  P3INP(  0:2**L  -  1  ) 

REAL  P2,  SUM 
C 

C        Set  maximum  loop  index 
C 

IMAX  =  2**L  -  1 
C 

C         Clear  the  P3IN  and  P3INP  arrays 
C 

DO  1  I  =  0,  IMAX 

P3IN(  I )  =  0.0 

1  P3INP(  I )  =  0.0 
C 

C         Get  the  PUN  array 
C 

CALL  GET_P1IN(  M,  L,  NBARP,  R,  PUN  ) 
C 

C         Build  the  P3INP  array 
C 

DO  4  I  =  0,  IMAX 

DO  3  J  =  0,  IMAX 

P2  =  P1IN(I)*P1IN(J) 

DO  2  K  =  0,  L 

INDX  =  I 

CALL  MVBITS(  J,  0,  K,  INDX,  0 ) 

2  P3INP(  INDX  )  =  P3INP(  INDX  )  +  P2 

3  CONTINUE 

4  CONTINUE 
C 

C         Perturb  with  mutation 
C 

DO  6  J  =  0,  IMAX 

IF(  P3INP(  J  )  .NE.  0.0  )  THEN 
DO  5  I  =  0,  IMAX 

5  P3IN(  I )  =  P3IN(  I )  +  P3INP(  J  )* 

&  ALPHA**POPCNT(  XOR(  I,  J  ) ) 

END  IF 

6  CONTINUE 
C 

C         Now  normalize  them 
C 

SUM  =  0.0 

DO  7  I  =  0,  IMAX 

7  SUM  =  SUM  +  P3IN(  I ) 
DO  8  I  =  0,  IMAX 
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8         P3IN(  I )  =  P3IN(  I  )/SUM 

C 

C         Return  to  caller 

C 

RETURN 

END 

SUBROUTINE  GET_P1IN(  M,  L,  NBARP,  R,  PUN  ) 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

c 

C         Name:  GET_P1IN 

C 

C         Purpose:  Compute  and  return  the  indicated  conditional 

C  probability  array 

C 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C        Declare  calling  arguments 
C 

INTEGER  M,  L,  NBARP(  M  ) 

REAL  R(  0:2**L  -  1  ),  P1IN(  0:2**L  -  1  ) 
C 

C        Declare  local  variables 
C 

INTEGER  I,  J,  K,  IMAX 

REAL  SUM 
C 

C         Set  maximum  loop  index 
C 

IMAX  =  2**L  -  1 
C 

C        Clear  the  P 1  IN  vector 
C 

DO  1  I  =  0,  IMAX 

1  P1IN(I)  =  0.0 
C 

C         Compute  the  numerators  and  accumulate  the  denominator 
C 

SUM  =  0.0 

D02J=  1,M 

K  =  NBARP(J)-  1 

P1IN(K)  =  P1IN(K)  +  R(K) 

2  SUM  =  SUM  +  R(  K  ) 
C 

C         Now  normalize  them 
C 

DO  3  I  =  0,  IMAX 

3  P1IN(I)  =  P1IN(I)/SUM 
C 

C        Return  to  caller 


C 


RETURN 
END 
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SUBROUTINE  UNPACK_NBARP(  M,  L,  NBARP,  NBAR  ) 
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 
C 

C    Name:  UNPACK_NBARP 
C 

C        Puq)ose:  Generate  displayable  version  of  packed  NBAR 
C 

C         Note:    This  procedure  assumes  that  M  and  L  are  in  range 
C  that  the  caller  supplied  NBARP  is  valid 

C 

ccccccccccccccccccccccccccccccccccccccccccccccccccccc 
c 

C         Declare  calling  arguments 
C 

INTEGER  M,  L,  NBARP(  M  ),  NBAR(  0:2**L  -  1  ) 
C 

C        Declare  local  variables 
C 

INTEGER  I 
C 

C         Clear  the  NBAR  vector 
C 

DO  1  I  =  0,  2**L  -  1 

1  NBAR(  I )  =  0 
C 

C        Now  set  the  nonzero  components 
C 

D02I=1,M 

2  NBAR(  NBARP(  I )  -  1  )  =  NBAR(  NBARP(  I )  -  1  )  +  1 
C 

C        Return  to  caller 
C 

RETURN 

END 
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